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1  Statement  of  Problem 

First,  in  the  area  of  traditional  methods  model  calibration  we  have  continued  developing  tools  for 
efficient  global  optimization  of  complex  environmental  models.  In  particular,  we  are  refining  hybrid 
optimization  algorithms  that  blend  evolutionary  strategies  for  global  optimization  with  local  search 
based  on  a  locally  const ructed  surrogate  model.  A  conference  presentation  was  given  on  this  topic 
and  au  article  apj>eared  in  the  conference  proceedings  <  (2);  attached  in  Appendix  A).  Moreover,  a 
journal  article  was  prepared  and  submitted,  (  [1]:  attached  in  Ap|>endix  B).  Most  recently  we  haw 
begun  utilizing  state  of  the  art  techniques  for  constructing  certain  local  quadratic  polynomial  models 
with  special  approximation  properties  il  l]  to  utilize  in  a  local  search  step  that  is  interwoven  between 
iterations  of  an  evolutionary  strategy. 

Second,  we  are  exploring  surrogate  modelling  techniques  for  t  he  accelerat  ion  of  Bayesian  model  cal¬ 
ibration.  In  Bayesian  model  calibration  Monte  Carlo  simulation  is  used  to  infer  approximate  posterior 
probability  distributions  of  the  parameters  given  the  ol>served  data,  hut  many  model  runs  are  required 
for  this  inference.  We  have  recently  liegun  collecting  data  from  previous  model  runs  to  interpolate 
future  estimates  of  the  posterior  probability.  These  estimates  are  used  to  filter  out  unnecessary  model 
runs,  thus  increasing  the  efficiency  of  the  Monte  Carlo  simulation. 
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2  Results  Summary 

2.1  Introduction 


Computer-based  calibration  of  environmental  models  generally  involves  minimization  of  an  “objective 
function ‘-a  measure  of  niodel-to- measurement  misfit.  In  the  watershed  modeling  context  it  could  be 
specified  as  differences  !>etween  measured  and  modeled  stream  flows  (see,  for  example,  [3,5]). 

An  important  consideration  in  assessing  the  performance  of  parameter  estimation  software  is  that 
of  run  time.  Parameter  estimation  software  must  run  the  model  to  lx?  calibrated  many  times  in  the 
course  of  minimizing  the  objective  function.  For  models  in  which  the  objective  function  surface  contains 
multiple  local  minima,  model  run  times  are  high,  or  when  multiple  prediction  specific  calibrations  must 
he  conducted  (S’.  it  Ls  particularly  important  that  the  parameter  estimation  software  conducts  a  minimal 
number  of  model  runs. 

Our  parameter  estimation  software  is  called  the  Model  Independent  Calibration  and  Uncertainty 
analysis  Toolbox  (MICTT)  and  it  is  compatible  with  the  PEST  model  independent  protocol  [i  making 
it  immediately  usable  by  a  large  community.  MIOUT.  while  compatible  with  PEST,  is  distinctive  in 
that  it  employs  more  efficient  algorithms  for  calibration  of  complex  models  than  PEST  [9].  If  a  single 
model  run  tab's  hours  or  days  to  execute  this  is  significant.  The  primary  focus  of  my  research  is  on 
the  further  devclpmcut  of  efficient  tools  for  model  calibration  and  uncertainty  analysis. 

For  calibration  of  models  in  which  the  objective  function  is  not  smooth  or  has  many  local  min¬ 
ima  the  use  of  traditional  gradient-based  search,  even  when  combined  with  a  global,  stochastic  search 
strategy  is  at  lx'st  inefficient  and  at  worst  fails  completely.  For  such  models,  one  approach  is  to  use 
evolutionary  strategies,  in  which  survival  of  the  fittest  is  used  to  adapt  the  best  parameter  sets  until 
a  global  minimizer  is  found.  One  of  the  best  evolutionary  strategies  to  date  is  the  Covariance  Matrix 
Adaptation  Evolution  Strategy  (CM AES:  |f>)). 

2.2  The  Covariance  Matrix  Adaptation  Evolution  Strategy 

The  CM  AES  is  an  evolution  strategy  that  adapts  the  covariance  matrix  of  a  normal  search  distribution. 
The  user  specifies  a  population  size  of  A  individuals.  The  randomly  selected  initial  population  is 
After  evaluating  the  objective  function,  the  best  fi  individuals  (with  smallest  values)  are  selected 
and  their  center  of  mass  Ls  computed  by  (x)vv  =  51^-1  ^ j  a  a-  The  recombination  weights,  a’*.,  are 
positive  and  sum  to  one.  x\fter  selection  and  recombination,  a  new  population  Is  created  accordig  to 
x\nxl)  =  (i}ir  -f  rT,fVOf>:^.  The  covariance  matrix.  and  the  step  size.  <r|r*  are  updated  after 
each  generation  [€>].  The  are  randomly  sampled  from  the  mult  ivariate  st  andard  normal  distribution. 
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On  convex-quadratic  functions  CMAES  converges  log-linearlv  after  an  adaption  time  that  scales 
between  0  and  n2.  where  ;j  is  the  number  of  parameters.  For  multimodal  objective  functions.  CMAES 
does  a  surprisingly  good  job  at  finding  the  global  minimum,  but  the  population  size  must  be  increased 
significantly  and  with  considerable  problem  variation,  to  ensure  global  convergence  [7].  For  the  syn¬ 
thetic  problems  and  watershed  models  we  have  thus  far  considered  the  CMAES  algorithm  yields  slower 
convergence  on  average  than  the  hybrid  algorithm  using  the  Levenberg-Marquardt  for  local  search  and 
the  Multi-Level  Single  Linkage  algorithm  for  glolxal  search  [9],  thus  we  are  attempting  to  accelerate  the 
convergence  of  CMAES  while  maintaining  its  robustness  for  finding  good  approximations  to  the  glo!>al 
minimizer. 


2.3  Hybrid  Evolutionary  Strategy  and  Local  Search 

Optimization  of  expensive  functions  is  more  feasible  with  algorithms  that  require  fewer  evaluations 
of  the  objective  function.  In  that  spirit,  we  propose  a  hybrid  evolution  strategy.  As  in  [10,11,16], 
the  proposed  algorithm  combines  an  evolution  strategy  (ES>  with  a  gradient  search  technique.  The 
propped  algorithm  differs  from  the  algorithm  ol  Tahk.  et  al  [10],  in  that  it  uses  a  local  radial  basis 
function  approximation  to  the  objective  function,  instead  of  finite  differences,  to  compute  approximate 
first  and  second  derivatives  to  the  objective  function  surface.  The  derivative  information  is  used  to 
propagate  a  gradient  individual  alongside  the  evolving  population.  The  gradient  individual  is  included 
for  possible  selection  each  generation.  The  use  of  radial  basis  functions  allows  the  derivatives  to  be 
estimated  without  imposing  any  special  structure  on  the  evolving  |x>pulatiou  of  points  as  is  required 
in  [10].  This  enables  the  use  of  the  CMAES  which  proved  to  be  rather  sensitively  t\uied  and  evidently 
breaks  when  the  evolving  imputation  is  perturbed  st  rongly  from  its  random  Gaussian  distribution. 

Tests  on  a  small  suite  of  standard  test  functions  and  a  hydrologic  application  show  that  this  hy¬ 
brid  approach  can  greatly  accelerate  the  covariance  matrix  adaptation  evolution  strategy  (CMAES) 
requiring  many  fewer  evaluations  of  the  objective  function  than  staudard  CMAES  at  a  greater  cost 
per  optimization  iteration  than  standard  CMAES.  However,  the  extra  computational  time  would  be 
incident  al  for  a  complex  environmental  model. 

Complete  details  of  the  method  and  the  numerical  tests  can  lx^  found  in  the  attached  conference 
pa|x^r  [2  in  Appendix  A  and  manuscript  submitted  to  Engineering  Optimization  *1]  in  Appendix  B. 

The  current  version  of  the  hybrid  CMAES/ rbf-assisted  local  search  requires  that  the  objective 
function  be  reasonably  smooth,  but  this  is  often  not  the  case  for  environmental  models.  Two  new 
variations  of  the  hybrid  algorithm  are  currently  being  developed  that  utilize  local  models  requiring 
less  smoothness  [1 1. 15].  Preliminary  efforts  using  these  local  models  to  calibrate  real  surface  water 


hydrological  models  have  been  very  promising.  Future  work  will  include  incorporating  one  or  both  of 
these  new  hybrid  algorithms  into  our  Model  Independent  Calibration  and  Uncertainty  analysis  Toolbox 
where  they  will  available  for  application  to  problems  of  interest  for  the  Coq>s  of  Engineer  and  other 
researcher's. 


2.4  Efficiency  enhancements  for  Bayesian  Model  Calibration 

In  Bayesian  Model  Calibration,  the  model  parameters  are  treated  as  random  variables  with  unknown 
distributions.  If  any  information  about  the  parameters  is  known  in  advance  this  is  specified  as  a  prior 
distribution.  In  the  hydrologic  context  usually  only  hounds  on  the  parameters  are  known  so  only  a 
uniform  or  uninformative  prior  distribution  is  typically  assumed  for  each  parameter.  The  idea.  then, 
is  to  find  the  posterior  probability  distribution  of  the  parameter  sets  given  the  olxserved  data.  Because 
the  probability  distribution  of  the  data  is  unknown  this  is  often  done  by  Monte  Carlo  simulation  using 
a  so-called  Monte  Carlo  Markov  Chain  (MCMC)  sampler.  There  are  many  such  samplers,  of  which 
one  that  is  particularly  popular  in  the  hydrological  community  is  the  Sliutflod  Complex  Evolution 
Metropolis  (SCEM-UA)  sampler  [1-i] .  As  with  all  MCMC  samplers,  the  convergence  to  the  posterior 
distribution  of  the  parameters  can  l>e  quite  slow  and  may  require  thousands  or  tens  of  thousands  of 
model  evaluations.  Tints,  while  a  complete  characterization  of  the  probability  distribution  is  desirable, 
it  may  be  prohibitively  expensive. 

Fortunately,  many  of  the  newly  sampled  ]>oints  in  the  MCMC  sequence  may  be  from  areas  in 
parameter  space  in  which  the  posterior  distribution  is  already  well-sampled  and  additional  sampling 
may  not  lead  to  new  information.  To  check  for  this,  we  store  all  previously  sampled  points  in  a  database 
along  with  the  values  of  the  log-likelihood  function  (used  in  the  estimation  of  the  posterior  probability). 
At  each  newly  sampled  point  locally  weighted  projection  regression  (LWPR:  [12] )  is  used  to  estimate 
the  posterior  probability.  Parameter  points  which  are  unlikely  to  l>e  accepted  in  the  current  Markov 
chain  based  on  their  LWPR  estimated  values  are  rejected.  The  actual  model  Is  evaluated  and  the  real 
posterior  probability  calculated  for  parameter  points  which  were  judged  jH^ihlv  acceptable  by  their 
LWPR  estimates.  To  assure  convergence,  of  the  Markov  chains  it  was  found  necessary  to,  at  random, 
occasionally  evaluate  the  true  posterior  probability  at  a  proportion  of  the  rejected  points  as  well.  This 
approach  resulted  in  a  20%-3 0%  savings  oi  model  evaluations  while  converging  to  comparable  posterior 
probability  distributions.  While  this  was  a  good  first  effort,  we  are  attempting  to  refine  the  approach  to 
yield  at  least  50%  savings  before  publishing  the  method.  A  preliminary  report  is  attached  in  appendix 
C. 
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Abstract 

Optimization  of  expensive  functions  is  more  feasible  with  algorithms  that  requite  fewer  evaluations  of 
the  objective  function.  In  that  spirit,  this  paper  proposes  a  hybrid  evolution  strategy.  As  in  [1 1.  9.  10). 
tlie  proposed  algorithm  combines  an  evolution  strategy  (ES)  with  a  gradient  search  technique.  The 
proposed  algorithm  iliffeis  in  that  it  uses  a  local  radial  basis  function  approximation  to  tin-  objective 
function  to  compute  approximate  first  and  second  derivatives  to  the  objective  function  surface.  The 
derivative  information  is  used  to  propagate  a  gradient  individual  alongside  the  evolving  population.  The 
giadieut  individual  is  included  for  possible  selection  each  generation.  Tests  on  a  small  suite  of  standard 
test  functions  and  a  hydrologic:  application  sIm.vv  that  this  hybrid  approach  can  greatly  accelerate  the 
covariance  matrix  adaptation  evolution  strategy  (CM AES).  This  hybrid  approach  is  flexible  and  requires 
little  modification  of  an  existing  evolution  strategy:  thus,  it  does  not  seem  to  alter  negatively  affect 
convergence  wlten  an  objective  function  does  not  have  sufficient  smoothness  for  derivatives  to  yield  useful 
descent  information. 

Keywords:  Hybrid  algorithms:  Evolution  strategies:  Quasi-Newton  method 

1.  Introduction 

For  multi-modal  fitness  functions  evolution  strategies  are  known  to  be  reliable,  but  slow  for  approximating 
global  optima.  A  large  population  size  is  required  for  the  strategy  to  explore  the  real  parameter  space,  but 
this  slows  local  convergence.  In  contrast,  classical  gradient-based  algorithms  are  exploitative  in  nature 
and  converge  quickly  to  local  minima,  but  they  are  not  good  at  finding  the  glolral  minimum. 

The  algorithm  proposed  In  this  paper  follows  the  hybrid  evolution  strategy  fust  outlined  in  |1 1).  but 
differs  in  two  respects:  first  and  meet  importantly,  a  mote  accurate  and  mote  flexible  metliod  is  utilizes! 
to  approximate  the  derivatives  required  for  the  local  seatch.  and  second,  the  evolution  strategy  is  the 
covariance  matrix  adaptation  evolution  strategy  (CMAES)  which  is  known  to  Ik:  particularly  effective 
at  approximating  global  optima.  CMAES  is  especially  effective  at  locating  global  optima  when  used  in 
conjunction  with  a  population  doubling  strategy  as  described  m  (1). 

The  derivative  estimation  method  will  lie  explained  fust  ami  it  will  be  shown  how  the  approximated 
derivatives  are  used  to  propagate  the  gradient  individual  between  generation.  Next  the  hybrid  algorithm 
will  he  summarized.  We  have  implemented  the  hybrid  algorithm  in  CMAES.  and  it  will  be  shown  how 
the  new  hybrid  algorithm  performs  on  a  suite  of  test  functions  in  10  dimensions.  Finally,  we  briefly 
demonstrate  an  application  in  hydrological  modeling. 

2.  Gradient  Individual  using  Radial  Basis  Functions 

The  foundation  of  this  optimization  algorithm  is  the  usual  evolution  strategy  in  which  new  offspring  are 
produced  at  each  generation  by  recombination  and  mutation  (see  Figure  1).  The  objective  function  is 
evaluated  at  each  of  these  offspring  and  the  fittest  offspr  ing  are  selected  as  parents  for  the  next  generation. 
In  the  hybrid  approach,  an  additional  individual,  called  the  gradient  individual  (9).  is  propagated  by  a 
different  mechanism  each  generation.  The  gradient  individual,  j'.  is  either  the  fittest  offspring  of  the 
current  generation  (individual  with  lowest  function  value)  or  the  gradient  individual  from  the  previous 
generation.  From  information  gathered  during  the  evolution  of  the  |K>puIation  tlie  fiist  and  second 
derivatives  of  the  objective  function  are  estimated  at  .r'  am!  used  to  perform  an  update  of  tlie  gradient 
individual  which  is  hopefully  moved  closer  to  a  stationary  point. 

.As  an  evolution  strategy  proceeds,  it  typically  does  not  use  tlie  previously  evaluated  points  beyond 
tire  current  generation:  however,  in  our  hybrid  strategy  we  store  tire  last  .V  points  and  tlreir  evaluated 
functions  values  m  a  database.  To  update  the  gradient  imlividual  we  use  a  I:- nearest  neighbor  local 
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function  approximation  of  the  objective  function  using  the  k  nearest  nciglil»is  (Euclidean  distance)  of 
r'n  in  the  database  to  construct  a  cubic  radial  basis  function  (ItBF)  approximation: 


* 

=  X]  “’■*  I !  ■ *  ~  x>  i )  +  *  €  R"  1 1  > 

1-1 

where  /,./  =  1.2 . A*  are  tin*  A-  nearest  neighbois  of  xlv  in  the  H-dimensional  search  space.  p  is  in  II  > 

(the  linear  space  of  poIymamaLs  m  n  variables  of  degree  leas  than  ur  equal  to  2).  and  $(r)  =  r*.  Cubic 
radial  basis  functions  were  selected  not  only  for  their  simplicity  and  differentiability,  but  also  because 
they  have  been  used  successfully  as  aurrogate  models  for  pre-evaluating  function  values  to  lessen  the 
number  of  function  evaluations  required  by  an  evolution  strategy  (8|. 

Define  the  matrix  •!»  €  £***  by 

(*).,  :=^(||*i-*>||a),  =  |2> 

Lit  ;(  =  (n  +  l)(n  +  2)/2  be  the  dimension  of  flj.  let  pi . Ik-  a  l«sis  of  this  linear  space,  and  define 

tin-  matrix  Pf  B1"1!'  follows: 

r„  i=  1 . k:j  =  l, (3) 

In  t his  model,  the  RBF  that  interpolates  the  points  (Xj,/(xi)) _ (xk.f(xk)}  is  obtained  by  solving  the 

system 

(a:)C)-U) 

where  F  =  (/(*, /<ar*))T.a.  =  (m . a-*)  €  R*  and  r  =  (cu...  ,c*)T  €  R’h.  Powell  (7)  gives 

sufficient  and  necessary  conditions  for  the  system  above  to  be  uniquely  solvable,  but  in  practice  the  real 
issue  can  be  that  the  coefficient  matrix  above  becomes  ill-conditioned.  However,  we  have  found  that 

simply  rescaling  anil  shifting  the  points  z, _ ,a(  so  that  they  all  lie  in  | —  1.  lj"  is  usually  sufficient  to 

address  this  issue. 

Once  the  RBF.  s(j-).  has  been  determined  by  Eq.fl).  then  s(x)  is  differentiated  analytically  to  determine 
approximations  to  the  gradient  and  Hessian  of  the  objective  function,  /(.r).  For  the  giadient  rector,  g. 
evaluated  at  the  gradient  individual,  : 

.V.  =  <V/(x')),  =  ± /«)  *  (V*«H,  =  *  =  » . »  (5) 

Fur  the  Ht^sian  matrix.  //.  evaluated  at  the  gradient  individual],  xj,: 

H„  =  =  T) ~r/(4)  *  >j  =  l.-.«  (G> 

Similar  techniques  for  derivative  approximation  are  routinely  used  in  the  solution  of  partial  differential 
equations  using  so-called  meshless  methods.  Moreover,  such  approximations  can  Ik-  spectrally  accurate 
|3|. 

Once  the  offspring  anil  t  heir  function  values  from  the  current  generation  have  been  appended  to  the 
database,  we  construct  t lie  RBF  as  above  and  determine  the  gradient  and  Hessian  approximated  at  the 
current  gradient  individual  z'v.  Finally,  a  new  gradient  individual  is  found  by  the  standard  update: 

V  <7> 

The  new  gradient  individual  is  then  added  to  t lie  current  generation  of  offspring  for  possible  selection. 
When  the  population,  and  hence  the  points  in  the  database,  are  sufficiently  dose  to  a  minimum  then  the 
gradient  and  Hessian  can  lie  accurate  and  can  yield  fast  local  convergence  to  the  minimum.  However, 
when  the  evolving  populat  ion  is  far  from  a  local  minimum,  then  the  gradient  and  Hessian  tend  to  Ik- 
inaccurate  and/or  at  least  the  update  does  not  lead  to  a  minimum.  It  s  worth  noting  that  the  addition 
of  this  gradient  individual  to  the  offspring  pool  for  selection  seems  to  never  be  harmful  to  the  search. 
This  metlrod  of  gradient-based  search  is  similar  to  a  quasi-Newton  method,  but  In  quasi-Newton  methods 
tire  first  derivatives  of  the  function  are  usually  available  (or  approximated  by  centered  finite  differences). 
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Figure  1:  Flowchart  for  the  hybrid  evolution  strategy  algorithm 

and  the  Hessian  matrix  is  updatid  iteratively  with  information  from  new  function  and  gradient  eval¬ 
uations.  In  fact,  tire  original  gradient  individual/evolution  strategy  hybrid  approaches  of  Talik.  et  al 
[9.  10]  used  the  quasi-Newton  method  to  advance  the  gradient  individual.  In  contrast,  in  this  proposed 
approach,  the  Hessian  matrix  is  completely  recomputed  at  each  generation  from  a  local  approximation 
to  the  objective  function  and  should  l»e  more  accurate  than  traditional  quasi-Newton  Hessian  approxi¬ 
mations. 

3. Covariance  Matrix  Adaptation  Evolution  Strategy  with  Gradient  Individual 
The  basic  outline  of  the  hybrid  evolution  strategy  algorithm  is  illustrated  hi  the  tlowchart  in  Figuie  1.  To 
study  the  efficacy  of  tills  version  of  tin-  hybrid  gradient  individual  /  evolution  strategy  approach  we  have 
implemented  the  algorithm  in  tire  context  of  CMAES.  While  we  have  also  done  this  in  the  context  of 
tin*  standard  evolution  strategy,  we  use  CMAES  here  because  it  seems  to  have  better  gloli.il  convergence 
properties.  We  want  to  see  if  tire  addition  of  the  gradient  individual  to  tire  offspring  at  each  generation 
will  negatively  affect  the  convergence  of  the  CMAES.  in  particular,  we  utilize*  the  Matlah  version  of 
CMAES  (version  2.54)  provided  public-ally  by  Hansen  |G). 

CMAES  is  an  evolution  strategy  that  adapts  a  full  covariance  matrix  of  a  normal  search  distribution 
|5|.  Tire  strategy  begins  with  ait  initial  population  of  A  individuals  x^|;>.  After  evaluating  the  objective 
function,  the  l>est  /i  individuals  are  selected  as  patents  and  tlreir  centroid  is  computed  by  using  a  weighted 
average:  (x}^"-1  =  £{!_,  "  where  the  weights,  w„  are  positive  reals  anil  sum  to  one.  The  notation 
Xj.,  is  called  selection  notation  and  represents  the  point  with  t  he  k"1  lowest  corresponding  objective  fitness 
value.  While  many  weighting  schemes  have  been  proposed,  here  we  use  the  super-linear  weights:  ir,  = 

ln(;i)  -  ln(t),i  =  1 . ;j.  wherein  tire  individuals  with  lowest  fitness  values  gee  the  highest  recombination 

weights. 

After  selection  anti  recombination  a  new  population  is  created  by: 

=  (8, 

where  zk  **  Y(0. 1)  are  independent  realizations  of  an  n-dimensional  standard  normal  distribution  with 
mean  2eio  and  covariance  matrix  I.  Tire  base  jxjint*.  are  rotated  and  scaled  by  tin*  eigenvectors.  B|f>. 
anil  tin*  square  root  of  the  eigenvalues  .  DM.  of  the  covariance  matrix  .  CM.  The  covariance  matrix.  CM. 
ami  global  step  size.  «*M.  are  u|x]au\l  after  each  generation.  This  approach  yields  a  strategy  that  is 
invariant  to  any  linear  transformation  of  the  search  space.  Equations  for  initializing  ami  u|Klating  the 
strategy  parameters  are  given  m  [4]. 
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Table  1:  Pseudocode  of  the  hybrid  algorithm. 

1:  Generate  and  evaluate  initial  population  x)'  ’i:A 

2:  Append  these  points  to  the  database 

3:  Choose  best  in<lividu;tl  and  set  x# 

4:  Set  1  =  0.0,=  Oort,  //„  =  C<°>  =  In,,. 

5:  repeat 

G:  Compute  nearest  neighbor  RBF  and  find  g,  and  //, 

7:  Update  the  gradient  individual  jJ*  +  1)  =  x9  -  < II, )-'g, 

8:  Evaluate  a^*1*  and  append  to  the  database  if  feasible 

0:  Select  the  best  ;i  individuals  from  the  populat  ion  and  x'a  * 1 1 

1(1:  Generate  new  populat  ion  by  recombination  and  mutation 

11:  Evaluate  new  |Mipulation  and  np|>cud  to  database. 

12:  if  min./tr,)  <  /(r$°)  then 

13:  swap  individual  with  lowest  function  value  and  i" 1 

14:  end  if 

15:  I  =  I  +  1 

1G:  until  Stopping  criteria  are  satisfied 

Pseudo-code  for  the  CMAES-RBFGI  algorithm  is  shown  in  Table  1.  One  additional  feature  that  has 
not  been  mentioned  is  that  a  weighted  norm  is  used  to  compute  nearest  neighbors  for  determining  the 
support  of  the  local  radial  basis  function  approximation.  We  use  the  current  covariance  matrix  of  the 
(’MAES  since  it  should  reflect  the  shape  of  search  distribution  and  the  objective  function  surface.  The 
eigenvector/eigenvalue  decomposition  of  the  current  covariance  matrix  is  C  =  B D~Br .  Tin*  distance 
between  a  point  x  6  R"  and  the  current  gradient  individual  xv  is  measured  as  | </?/->) _  1  < j-  -  xv)\t.  (For 
instance,  a  unit  ball  in  this  norm  will  Ik-  elliptically  shaped  to  fit  in  a  long  narrow  valley  in  the  search 
space.)  The  nearest  neighbors  in  this  norm  should  be  ideal  |H>ints  for  constructing  an  approximation  to 
the  Hessian  matrix. 

4.  Hybrid  Algorithm  applied  to  test  suite 

A  small  suite-  of  test  problems  has  been  selected  to  compare  tin1  performance  of  our  hybrid  CMAES 
gradient  individual  algorithm  (CMAES-RBFGI)  with  ordinary  CMAES.  For  comparison,  we  have  also 
implemented  tin-  quasi-Newton  Hessian-approximation  gradient  individual  approach  of  Tahk.  et  al.  (HI) 
in  the  context  of  (’MAES  (their  implementation  was  in  the  standard  evolution  strategy);  we  refer  to  tins 
implementation  as  C.MAES-QNGI. 

In  CMAES-QNGI,  after  recombination  (finding  tin-  centroid  of  the  selected  parents),  only  the  fust  half  of 
tire  new  population  of  individuals  is  generated  by  Eq.8.  After  evaluating  the  objective  function  for  these 
individuals,  the  current  gradient  individual  is  swapped  for  an  individual  with  lower  objective  function 
value,  if  one  exists,  to  ensure  that  the  gradient  individual  is  the  current  lK-st  point.  The  remainder  of 
live  current  population  is  formed  by  leflecting  tin*  fiist  half  of  the  population  symmetrical  through  the 
gradient  individual  in  3".  This  symmetrically  selected  population  reduces  the  order  of  the  discietization 
errors  in  foiming  the  first  and  second  derivative  approximations  at  the  gradient  individual. 

A  summary  of  the  selected  test  functions  is  shown  in  Table  2.  Functions  /(.  the  classical  splrere  function, 
and  f  j  are  simple  unimodal  functions.  /,  is  a  simple  unimiKlal  function  with  random  noise.  /t  is  a  simple 
nnimoda)  function  but  is  not  differentiable  at  the  minimum  (oi  at  many  other  points).  /-,  is  the  classical 
Rosenbroch  function  which  has  two  minima  and  a  long  flat  valley.  /?.  ami  /»  are  all  multimodal.  The 
Rastiigin  function.  /?  picsents  some  difficulty  for  CMAES.  while  Ackley  function  has  tin'  feature  that  it 
is  not  differentiable  at  the  gtolhil  minimum.  All  of  tlve  functions  have  their  glolwd  minimum  value  of  zero 
at  zero. 
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Table  2:  Test  Functions  fen  Hybrid  Optimization  Strategy 


Name 

IMimtion 

Sphere 

(-40.  GO)" 

Schwefel  1.2 

(—40.00)” 

h{*)  =  /zlj-)(l  +  "1:1  where  :  is  Af(U.  1J" 

|-40.W)|" 

Schwefel  1.5 

I  t: 

u[x) = y.  m + n  w 

»^i 

(—40.  GO]" 

Rosenhrodc 

/.la  1  =  Y,  [lOO  (*•>.  -  *?)’  +  (Xi  -  if 

J-l  ^ 

[—40,60]" 

Griewank 

Rastrigin 

[-40,60]" 

Ackley 

(-32. 32]" 

Figure  2  shows  convergence  graphs  for  each  of  the  test  functions  for  each  of  the  three  algorithms: 
CM  AES,  CMAES-RBFGI.  CMAES-QNGI.  For  each  function  tlx-  dimension  is  set  at  ;i  =  111.  Tlie  pop¬ 
ulation  size  is  A  =  30  with  ji  =  15  parents  being  selected  at  each  generation.  Tire  initial  glolial  step 
size-,  a  is  set  to  3091  of  the  total  length  of  the  search  domain  in  each  dimension.  The  initial  population 
is  sampled  from  a  uniform  distribution,  and  the  same  samples  are  used  to  initialize  each  of  the  three 
algorithms.  For  tire  CMAES-RBFGI  algorithm,  the  local  RBF  approximation  is  constructed  using  tin-  k 
nearest  neighbors  with  k  =  fl.5(n+  l)(n  +  2)/2]  =  99  which  is  just  509!  larger  than  the  minimum  number 
of  points  necessary  to  construct  a  quadratic  polynomial  interpolant  in  R".  The-  k  neatest  neighbors  are 
selected  from  among  tire  last  AT  =  2 k  =  198  individuals  that  have  been  evaluated.  For  each  algorithm. 
30  trials  are  conducted  for  each  test  function.  The  algorithm  is  stopped  when  a  minimum  objective 
function  value  of  10-,l>  is  reached  or  when  the  best  objective  function  value  does  not  change  fin  the  last 
111  +  30w/A  generations  or  when  tire  ratio  of  tin*  range  of  the  current  function  values  to  the  maximum 
current  function  value  is  below  Toi.Fun=  5  x  10~10. 
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Figure  2.  Convergence  «.f  tin*  median  objective  value  over  30  I  rials  fui  10  dimensional  lest  function*. 

The  convergence  graphs  shown  in  Figure  2  illustrate  several  things.  First,  the  approach  of  Tulik.  ot  nl. 
(9.  10]  appear*  to  interfere  with  the  convergence  of  CMABS.  This  does  not  seem  to  lie  a  cane  of  simple 
error  in  implementing  their  strategy  an  we  have  been  able  to  successfully  implement  and  tost  it  in  the 
context  of  a  standard  evolution  strategy.  Rather  the  synimetriBition  of  the  p<>)iulatioii  at  each  generation 
seems  to  interfere  with  the  finely  timed  covariance  matrix  adaptation  and  stop  sire  adaptation  algorithm 
in  tlit-  version  of  ('MAES.  Tin*  RBF  gradient  individual  approach  dotw  not  have  this  difficulty  since  it 
docw  not  nioilify  tire  existing  population  of  the  evolution  strategy  ill  any  way.  but  simply  ap)>ends  an 
extra  individual  to  the  population.  Secondly,  for  simple,  smooth  convex  functions  the  C’MAES-RBFCil 


algorithm  Is  great  ly  accelerated  relative  to  CMAES  as  can  be  seen  lot  functions  J,  :  Sphere  and  /■  : 
Sdiwefel  1J2.  When  there  is  atniug  noise  in  the  objective  fuuctiuu  we  would  not  expect  a  gtvulteut-hustd 
search  tu  perform  well:  ns  can  be  seen  in  with  ft.  there  is  no  aecelerntkiu  with  CMAES-RBFGL  Fuuetiims 
/,  :  Schwefel  1.5  (uiiiumdol)  mu!  /,  Ackley  (multimodal)  nre  not  differentiable  at  the  global  tuhiiiuuiu 
(both  have  n  sharp  points).  As  can  he  seen  in  the  Bgure.  little  acceleration  is  achieved  by  CMAES- 
RBFGI.  I  lu  a  rest  iuit  shown  here,  I  he  Ackley  function  was  squared  to  rninove  the  singularity  in  the 
first  derivative  anil  the  result  appealed  wore  hki-  that  of  /<,.(  Fur  the  Rucnhioct  function.  /&  and  the 
Griewank  function.  tin1  convergence  graphs  show  tluit  the  convergence  of  CMAES-RBFGI  accelerates 
dramatically  as  the  population  approaches  the  iniuimixer.  Finally,  for  the  Rastrigin  function.  J-  there 
is  very  little  difference  between  CMAES  and  CMAES-RBFGI  because  both  algorithms  have  difficulty 
locating  the  global  iniiiiiuuiii.  Instead  llu'V  converge  to  a  variety  of  local  iniiiiina. 

As  for  reliability,  the  algorithms  are  nearly  indisiiguishablc  Foi  fu  ml  ions  f\ - J,  both  CMAES  and 

CMAES-RBFGI  found  the  global  optimum  for  .ill  trials.  For  the  Rowubrock  function,  /r,  both  algorithms 
located  the  global  optimum  m  28  of  the  3(1  trials.  Fbt  the  Griewank  funct  ion.  J,,  CMAES  found  tin'  gli.luil 
optimum  11  times  while  CMAES- RBF-G1  found  it  23  times.  Neither  algorithm  ever  found  the  glulml 
optimum  for  the  Rostrigin  function,  J-.  For  the  Ackley  function,  /.  CMAES  only  failed  to  find  the  global 
optimum  once,  while  CMAES-RBFGI  found  it  for  all  30  trials.  These  results  are  promising  as  they  show 
tluit  the  addition  of  the  glad  lent  uuhvidual  does  uot  Impede  the  global  search  ca|M)liiUty  ol  the  evolution 

strategy’. 


Figure  3:  Convergence  cuuipmisuu  for  Rostrigin  function  with  nstarta 

For  the  Rastrigin  function  we  rouduct  a  new  experiment  in  winch  we  run  the  evulutiun  strategy  using 
a  imputation  douhliug  restart  strategy  (1).  Each  variation  of  CMAES  is  run  until  convergence  as  above 
with  same  A  =  311  as  above  and  then  the  population  size  A  is  doubled  while  /i  =  A/2  and  the  algorithm 
restarted.  For  CMAES-RBFGI  the  database  of  ixiiuts  is  maintained  between  restarts.  A  total  of  l 
restarts  are  olkiwt>d  to  a  maximum  popuhit  ion  she  of  1811.  The  maximum  iiuiuIk’i  of  function  evaluations 
remaius  capped  at  "id.tMUl.  The  CMAES  |M.puIatiou  size  Increasing  strategy  has  Ixtu  sluiwn  to  he  oue  of 
the  mist  successful  glulial  op!  imitation  met  buds  (ueseutly  known  for  some  brmhuuuk  problems  |lj.  The 
pouveigeuce  graph  for  the  meduui  function  value  over  30  trials  is  slurwu  in  Figure  3.  Most  significantly. 
CMAES-RBFGI  demonstrates  far  greater  leliahillty.  CMAES  correctly  locates,  within  the  alloted  SU.flOO 
function  evaluations,  the  global  minimum  at  into  in  5  of  the  30  Utah,  while  CMAES-RBFGI  tiuils  the 
gloiiid  miminum  m  22  of  the  30  trials. 

5.  Application  to  calibration  of  a  watershed  model. 

To  demonstrate  the  utility  of  the  CMAES-RBFGI  algorithm  in  a  practical  setting  we  usi.l  the  algorithm 
to  calibrate  HYMOD,  a  live- parameter  concept  md  rainfall-runoff  mnilrl  (see  Figure  5),  Introduced  in  [2]. 
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hi  -'hint,  given  time  aerie:  uf  daily  precipitation  (/’)  and  evapotran  spiral  ion  (£T)  data  the  objective 
is  tune  the  parameters  so  that  the  least  squares  error  between  the  model  predicted  stream  flow  time 
series  and  the  observed  stream  flow  time  series  is  minimized.  Such  problems  are  usually  characterized  by 
multiple  minima,  sometimes  unidentifiable  parameters  and  even  discontinuities  in  the  objective  function. 


Figure  4:  Schematic  representation  of  the  I1YMOD  model:  fiom  (12] 


Tlie  HYMOD  model  is  a  simple  rainfall  excess  model  (details  in  [Moore  19&5]),  connected  with  two  series 
of  linear  reservoirs:  throe  identical  quick  and  a  single  slow  response  reservoir.  The  five  parameters  to  lie 
calibrated  for  the  model  stream  flow  to  match  t ho  observed  stream  flow  data  are:  the  maximum  storage 
capacity  of  the  catdiement.  f  Ynax:  tl*“  degree  of  spatial  variability  of  the  soil  moisture  capacity.  \.Xp: 
tire  factor  distributing  flow  between  the  two  series  of  reservoirs.  Alpha:  and  the  residence  time  of  the 
linear  quick  and  slow  reservoirs.  /?,  and  II,.  respectively. 

Three  years,  Oc tolar  1,  1948.  to  September  30,  1951.  of  daily  hydrologic  data  from  the  Leaf  River 
watershed  were  used  for  mode)  calibration.  This  humid  1011  km*  watershed  is  located  north  of  Collins. 
Mississippi.  The  data,  obtained  from  tlie  National  Weather  Service  Hydrology  Labratory,  consist  of  mean 
areal  precipitation  (mm/d),  potential  evapotranspiralion  (mm/d),  ami  stream  flow  (m3/s). 


Table  3:  Uncertainty  ranges  of  HYMOD  model  parameters 


Minimum 

Maximum 

Tiut 

(max 

1.000 

500.000 

mm 

foxp 

0.100 

2.000 

Alpha 

0.100 

0.990 

H, 

0.000 

0.300 

day 

0.000 

0.990 

day 

Tlie  ('MAES  and  CMAES-RBFGI  algorithms  are  each  applied  to  the  optimization  or  calibration  of  the 
HYMOD  model  for  30  trials.  The  initial  ranges  of  the  parameter  values  are  shown  iu  Table  3.  As  for  the 
test  functions  discussed  above,  the  algorithms  are  initialized  with  the  same  uniform  initial  distributions. 
We  set  A  =  10  and  /i  =  5.  Each  algorithm  stops  when  T<u.Fi’N=  5c  -  4.  as  described  previously.  For 
CMAES-RBFGI  the  number  of  nearest  neighbors  used  for  the  local  RBF  is  A-  =  [1.5(n  +  l)(n+2)/2]  =  23 
chosen  from  the  last  A'  =  21-  =  10  evaluated  points. 


tuncton  ovaic 

F'igurr  V  Convergence  of  CM  AES  and  CMAES-IU3FGI  fm  the  I1YM0D  model. 

Then*  are  many  lucid  minima,  Iml  C’MAES  and  CMAES-RI5FGI  nearly  always  cuuveige  tu  uue  uf 
two  minima  x,  =  ( 157.07*10. U..U40, 0.2370.  0.2G24, I1.S178)  where /|m,OD(x,)  =  138. Wifi  ur  x2  = 
( 1 10 J1808. 0.7 1 0a. 0.21 1  (i. 0.20 l'J, O.SJl  1 :«)  where  /nYMOD<xJ>  =  The  global  minimum  ap|K*ors 

to  be  at  x,  but  CMAES  cuuvergea  tu  x2  m  28  uf  :<0  trials.  while  CMAES-RBFGI  converges  tu  the  same 
lulutumui  iu  27  uf  IK)  trials.  Evidently,  the  basin  uf  iittiacUnii  fur  the  global  minimum,  Xj.  is  i|uite  small 
as  huth  algoiitlims  hove  tiuuble  Ibuling  It.  The  luielerated  cotiveigeutv  ul  CMAES-RBF  to  the  local 
minimum  uf  the  HYMOD  model  is  demuustraled  in  Figure  5.  Rir  each  tiiid  the  best  limetkiti  value 
is  saved  at  eadi  generation.  The  mediiui  fuuctiuu  value  over  the  thirty  trials  minus  the  value  at  the 
lucid  minimum  ./(Xj),  is  pint  ted  versus  the  number  of  fuuctiuu  eudu.it li ms.  As  Flute  5  deuuaistrales. 
tin*  Increase  lu  omvergeuee  speed  is  quite  dramatic:  CMAES- RBFGI  I  vpicnlly  converges  with  fewer  than 
half  uf  the  objective  function  evaluatiuns.  Though  neither  algorithm  reliably  lucales  the  glided  minimum, 
buth  algorithms  give  guud  approximations  tu  tin1  glulud  minimum  that  piudure  adequate  approxluuitiuus 
tu  the  dally  stream  Bows.  Tu  Ionite  the  glubal  minimum  reliably,  a  IBS  tart  strut  egy  could  be  used  as  with 
the  Rastrigui  fuuctiuu  above.  The  RBFGI  method  would  still  accelerate  the  convergence  significantly. 

fi.  Cunclusiuns 

The  gindient  indlvhlmil  hybiidizatum  appinarh  for  evolution  stmtegha*  lias  been  shown  to  he  effective 
for  siguiflciuitly  accelerating  the  convergence  of  the  covuriaucc  matrix  adaptation  evolution  strategy. 
Likewise,  it  also  works  with  the  standard  evolution  strategy,  though  the  tesults  ate  not  shown  here.  To 
develop  a  hybrid  evolution  strategy  using  local  RBF  appiuximatiuu,  .is  we  have  done  here,  requires  very 
little  modification  of  the  uctiud  evolution  strategy.  Tin-  evolving  population  itself  is  not  modified,  but  the 
additional  gradient  individual  is  addid  at  each  generation.  In  the  gradient  individual  approach  of  Tahk. 
et  al  the  imputation  is  diosen  symmetrically  at  each  generation  and.  as  seen  here,  this  can  interfere  with 
tin*  convergence  of  ('MAES.  Allot hei  advantage  to  this  approach  is  that  no  minimum  po|iulatiun  sire  is 
lequiied.  Iu  [9.  10]  the  population  sire  must  lie  at  least  twice  t  he  dimetision  of  t  he  search  space  to  est  imate 
the  gradient  vector.  The  downside  of  RBFGI  approach  is  tluit  it  i.-.  expensive  to  form  the  coefficient  matrix 
iu  Eq.  4.  Moreover,  the  sire  uf  that  system  scales  as  0(;iJ).  so  its  solution  by  a  direct  methoil  requires 
0(»*)  operations  |H*r  generation.  Tims,  there  is  a  trade-off  between  the  computational  complexity  of  the 
RBFGI  method  and  tin*  gain  due  to  fewer  function  evaluations.  For  ex|H*iisivi*  objective  functions  the 
cost  of  adding  a  gradient  individual  propagated  In  local  radiid  basis  function  nppiuxiiuntioii  is  expected 
to  be  incidental  and  tin*  increase  in  speed  can  be  enormous.  As  a  remit.  tin*  RBFGI  approach  should 
he  incorporated  into  evolution  strategies  for  expensive  functions  as  it  can  sometimes  greatly  Increase 
converge  speed  and  reliability  with  little  downside. 
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Hybrid  Optimization  using  an  Evolutionary  Strategy  and 
Surrogate  Assisted  Local  Search 
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We  present  a  new  derivative-free  hybrid  algorithm  for  global  optimization  of 
expensive  black  Ixix  functions.  The  algorithm  uses  an  evolution  strategy  for 
global  search.  Convergence  toward  local  minima  is  accelerated  by  including  a 
local  search  individual  in  each  generation.  The  local  search  individual  is  com¬ 
puted  by  extracting  derivative  information  from  a  radial  basis  function  approx¬ 
imation  to  the  objective  function  interpolated  from  previously  evaluated  points 
in  the  evolutionary  strategy.  This  hybrid  approach  docs  not  require  artificial 
or  user-defined  switching  between  global  and  local  search.  Numerical  results 
are  presented  on  mathematical  test  problems  from  the  optimization  literature 
and  for  a  small  dimensional  conceptual  watershed  model  from  hydrology. 

Keywords:  Hybrid  algorithms:  Evolution  strategies:  Quasi- Newton  method 


1.  Introduction 

Evolution  strategies  are  known  to  be  reliable  but  expensive  for  approximating  global 
optima  particularly  for  multi-modal  fitness  functions.  A  large  population  size  is  required 
for  the  strategy  to  explore  the  real  parameter  space,  but  this  slows  local  convergence. 
In  contrast,  classical  gradient-based  algorithms  are  exploitative  in  nature  and  converge 
quickly  to  local  minima,  but  they  are  not  good  at  finding  the  glolxal  minimum. 

Many  algorit  hms  at  tempt  to  accelerate  the  convergence  of  the  evolutionary  st  rategy,  or 
other  population- based  search  method,  by  either  switching  to  a  local,  usually  gradient- 
based.  search  at  some  user-defined  threshold  or  by  applying  some  local  search  operator  at 
every  generation  so  that  the  global  and  local  searches  are  interwoven.  Examples  of  the  for¬ 
mer  strategy  include  switching  from  part  icle  swarm  optimization  to  sequential  quadratic 
programming  at  a  user  defined  threshold  (Min  et  at.  2()07)  and  similarly  switching  from 
a  genetic  algorithm  to  a  Levenberg-Marquardt  algorithm  (Peters  et  at.  2010).  Examples 
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of  interwoven  strategics  include  incorporation  of  an  extra  individual  at  each  generation 
of  an  evolution  strategy  calculated  from  an  approximated  Newton  step  (Woo  et  al.  2004, 
Talik  et  al.  2007.  2000).  the  use  of  a  discrete  gradient  oj>erator  to  improve  the  best  in¬ 
dividual  iu  each  generation  of  an  evolution  strategy  (Abbas  cl  at.  2003),  and  a  hybrid 
genetic  algorithm  that  uses  a  quasi-Newton  step  to  attempt  to  improve  the  fitness  of 
every  individual  at  each  iteration  (Renders  and  Flasse  1996). 

Another  approach  taken  to  lessen  the  number  of  expensive  objective  function  evalu¬ 
ations  iu  evolutionary  strategies  and  other  population- based  algorithms  is  to  use  func¬ 
tion  approximation  models  as  surrogates  for  the  objective  function.  Typically,  estimated 
function  values  are  used  to  screen  offspring  and  the  more  expensive  objective  function  is 
evaluated  only  at  the  most  promising  offspring  (Regis  and  Shoemaker  2<)(H.  Kern  et  at. 
20(H).  In  t  his  paper  we  propose  to  use  similar  function  approximation  models  to  approx¬ 
imate  derivative  information  for  a  local  search  that  will  be  interwoven  with  the  global 
evolutionary  st  rategy.  In  fact  ,  the  same  function  approximation  model  can  l>e  used  toap- 
proximate  derivatives  for  the  local  search  and  as  a  surrogate  to  screen  objective  function 
values  lor  the  global  search,  though  we  do  not  report  on  that  here. 

We  follow  essentially  the  same  framework  as  t  he  hybrid  evolutionary  strategy  first  out¬ 
lined  in  (Woo  et  at.  20IM)  and  subsequently  improved  iu  (Tahk  et  al.  2007.  2009).  In  that 
algorithm,  a  standard  evolutionary  strategy  is  used  to  advance  the  ]x)pulation  at  each 
generation.  Additionally,  an  individual  called  the  gradient  individual  is  propogated  along¬ 
side  the  evolving  population.  The  gradient  individual  is  calculated  by  making  a  Newton 
update  from  the  gradient  individual  of  the  previous  generation  or  from  the  fittest  point 
of  the  current  generation.  The  gradient  is  estimated  from  a  least  squares  finite  difference 
approximat  ion  and  the  Hessian  is  iteratively  approximated  by  one-dimensional  finite  dif¬ 
ference  updatrc  using  the  previously  evaluated  points  iu  the  population.  The  gradient 
individual  hybrid  approach  has  been  shown  to  improve  convergence  of  the  standard  evo¬ 
lutionary  strategy  on  a  small  number  of  mathematical  test  problems.  One  drawback  to 
the  algorithm  is  that  the  individuals  iu  the  evolving  population  are  selected  symmet¬ 
rically  about  the  estimated  minimum  of  the  current  generation:  xmju  ±  Ax.  Selecting 
the  population  symmetrically  increases  the  accuracy  of  the  finite  difference  discretiza¬ 
tions  of  the  derivatives,  however  the  symmetric  population  seems  to  interfere  (Baggett 
and  Skahill  2010)  with  the  convergence  of  the  covariance  matrix  adaptation  variation 
of  the  evolutionary  strategy  (C'MAES  due  to  Hansen  and  Ostermeier  1996).  While  we 
have  not  determined  exactly  how  the  symmetrized  population  interferes  wit  h  adaptation 
in  C’MAES.  we  think  it  is  likely  related  to  the  covariance  matrix  u|>datiug  and/or  the 
step  length  updating  within  that  algorithm.  Another  drawback  to  the  symmetrized  finite 
difference  hybrid  algorithm  is  that  the  minimum  population  size  for  the  evolutionary 
strategy  is  2n.  where  n  is  t  he  dimension  of  t  he  search  space. 

Our  proposed  algorithm,  however,  differs  from  t  he  gradient  individual  hybrid  evolution 
strategy  (Woo  ct  al.  2004,  Tahk  et  at.  2007,  2009)  in  two  important  respects:  first,  in¬ 
stead  of  using  finite  differences  to  approximate  local  derivative  information  we  fit  a  local 
function  approximation  model  to  previously  evaluated  points  and  differentiate  the  model 
analytically,  and  second,  the  evolution  strategy  is  the  covariance  matrix  adaptation  evo¬ 
lution  strategy  (C’MAES)  (Hansen  and  Ostermeier  1996).  The  use  of  the  surrogate  model 
to  approximate  the  derivatives  allows  the  estimation  of  derivatives  without  making  any 
modifications  to  the  basic  structure  of  the  evolutionary  strategy.  In  fact,  this  approach 
allows  for  hybridization  of  virtually  any  population-based  search  algorithm.  In  this  arti¬ 
cle  we  will  focus  on  hybridization  of  C’MAES  which  is  known  to  l>o  particularly  effective 
at  approximating  global  optima.  C'MAES  is  especially  effective  at  locating  glol>al  optima 
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when  used  in  conjunction  with  a  populat  ion  doubling  strategy  as  described  in  Auger  and 
Hansen  (2005).  More  recently,  a  two  |>opulation  restart  strategy  in  which  one  insula¬ 
tion  grows  exjxmentially  huger  while  the  other  is  maintained  at  a  small  size  has  shown 
promise  (HaiLsen  2009).  hut  is  not  explored  here. 

The  derivative  estimation  method  will  l>e  explained  first  and  it  will  be  shown  how  the 
approximated  derivatives  are  used  to  perform  local  search  by  calculating  a  “local  search 
individual"  in  each  generation.  Next  the  hybrid  algorithm  will  be  summarized.  We  have 
implemented  the  hybrid  algorithm  in  CM  AES.  and  it  will  be  shown  how  the  new  hybrid 
algorithm  performs  on  a  small  suite  of  test  functions  in  10  ami  30  dimensions.  Finally, 
we  briefly  demonstrate  the  calibration  of  a  conceptual  watershed  model  using  the  hybrid 
algorithm. 


2.  Local  Search  using  Radial  Basis  Functions 

The  foundation  of  this  optimization  algorithm  is  an  evolution  strategy  in  which  new 
offspring  are  produced  at  each  generation  by  recombination  and  mutation  (see  Figure 
2).  The  objective  function  is  evaluated  at  each  of  these  offspring  and  the  fittest  offspring 
are  selected  as  parents  for  the  next  generation.  In  the  hybrid  approach,  an  additional 
individual,  called  the  local  search  individual,  is  proixagated  by  a  different  mechanism 
each  generation.  Tahk  ct  al.  <2tl()7)  refer  txx  this  additional  individual  as  the  gradient 
individual,  but  we  call  it  the  local  search  individual  since  in  practice  it  could  be  the  result 
of  any  local  search  that  is  a  function  of  the  previously  evaluated  |>oints.  The  current  local 
search  individual.  xjs.  is  either  the  fittest  offspring  of  the  current  generation  (individual 
with  lowest  function  value)  or  the  local  search  individual  from  the  previous  generation. 
From  information  gathered  during  the  evolution  of  the  population  the  first  and  second 
derivatives  of  the  objective  function  are  estimated  at  ami  used  to  perform  an  update 
of  the  local  search  individual  which  is  hopefully  moved  closer  to  a  stationary  point. 

As  an  evolution  strategy  proceeds,  it  typically  does  not  use  the  previously  evaluated 
points  beyond  the  current  generation:  however,  in  our  hybrid  strategy'  we  store  the  last 
N  |>oints  and  their  evaluated  functions  values  in  a  database.  In  practice,  if  the  objective 
funct  ion  is  very  expensive  to  evaluate,  we  might  use  all  of  the  previously  evaluated  |H>ints. 
To  update  tin'  local  search  individual  we  use  a  A- nearest  neighbor  local  function  approx¬ 
imation  of  the  objective  function  using  the  k  nearest  neighlxxrs  (Euclidean  distance)  of 
jjs  in  the  datal>asc  to  construct  a  cubic  radial  basis  function  (RBF)  approximat  ion: 

* 

s(x)  =  Y,  “■.<’(11*  -  *11*)  +  P(*>.  *  €  R"  (1) 

i-1 


where  i u9i  =  1,2 _ ,&  are  the  k  nearest  neighlxois  of  jjs  in  the  n-dimensional  search 

space,  p  is  in  fl^1  (the  linear  space  of  polynomials  in  n  variables  of  degree  less  than 
or  equal  to  2),  and  <6(r)  =  r*.  Cubic  radial  basis  functions  were  selected  not  only  for 
their  simplicity  and  differentiability,  but  also  because  they  have  been  used  successfully 
as  surrogate  models  for  prc-evaluating  function  values  to  lessen  the  numl>er  of  function 
evaluations  required  by  an  evolution  strategy  (Regis  and  Shoemaker  2004). 

Define  the  matrix  <I>  €  Rkxk  by 


Wy  :=0(||x, 


(-) 
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Let  it  =  (n  4*  l)(n  -f  2)/2  l>o  the  dimension  ol  H*},  let  p\ . pti  be  a  basis  of  this  linear 

space,  anti  define  the  matrix  P  €  as  follows: 

P,,  '=  P,{x>h  '  =  1 . *;j  =  1,  — .»».  (3) 

In  tills  model,  the  RBF  that  interpolates  the  points  (*i, /(i,  )>,...  (xt,  fin))  Is  ob¬ 
tained  by  solving  the  system 


where  F  =  (/(*, )..... /(a*))T, <«'  =  (wi, . . . , ■»*)  €  R*  and  c  =  c*)T  G  R'v. 

Powell  ( 1992)  gives  sufficient  and  necessary  conditions  for  the  system  above  to  l>o  uniquely 
solvable,  but  in  practice  the  real  issue  can  1  >e  that  the  coefficient  matrix  above  becomes 
ill-conditioned.  However,  we  have  found  that  simply  rescaling  and  shitting  the  ]>oints 
j|... . . X*  so  that  they  all  lie  in  | —  1 .  l]n  is  usually  sufficient  to  address  this  issue. 

Once  the  RBF,  #(x).  has  been  determined  by  Eq.(l),  then  s(x)  is  differentiated  analyt¬ 
ically  to  determine  approximations  to  the  gradient  and  Hessian  of  the  objective  function, 
/(x).  For  the  gradient  vector,  g.  evaluated  at  the  local  search  individual,  xj^: 

«  =  (v/(*|s))1  =  ^:/(4)“(^(41)),  =  ^*(Ji»)'  . "  w 

For  the  Hessian  matrix.  //.  evaluated  at  the  gradient  individual,  x^: 

"•>  =  (»<*L>«)  -  ■ . “  <«) 

Similar  tediniques  for  derivative  approximation  are  routinely  used  in  the  numerical  so¬ 
lution  of  partial  differential  equations  using  so-called  meshless  metho<ls.  Moreover,  such 
approximations  can  be  spectrally  accurate  (faster  t  han  |>olyuomial  in  the  grid  size)  de¬ 
pending  on  the  selection  of  interpolation  points  (Fornberg  et  al.  2009). 

Once  the  offspring  and  their  function  values  from  the  current  generation  have  been 
appended  to  the  database,  we  construct  the  RBF  jus  above  and  determine  the  gradient 
and  Hessian  approximated  at  the  current  local  search  individual  xj^.  Finally,  a  new  local 
search  individual  is  found  by  the  standard  update: 

4"=4.-"  '»■  (7) 

The  new  local  search  individual  is  then  added  to  the  current  generjition  of  offspring 
for  possible  selection.  When  the  population,  and  hence  the  points  in  the  database,  are 
sufficiently  close  to  a  minimum  then  the  gradient  and  Hessian  can  be  accurate  and  can 
yield  fast  local  convergence  to  the  minimum.  However,  when  the  evolving  ]x>pulation  is 
far  from  a  local  minimum,  then  the  gradient  and  Hessian  tend  to  be  inaccurate  and  the 
update  does  not  lead  to  a  minimum.  In  such  cases,  xj^  1 ,  is  typically  not  selected  jus  an 
offspring  by  the  evolutionary  strategy  and  thus  docs  no  harm  to  convergence  of  the  gloal 
search. 

This  method  of  gradient- based  search  is  similar  to  a  quasi-Newton  method,  but  in 
quasi-Newton  methods  the  first  derivatives  of  the  function  are  usually  available  (or  ap- 
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proximated  by  centered  finite  differences),  and  the  Hessian  mat  rix  is  updated  iteratively 
with  information  from  new  function  and  gradient  evaluations.  In  fact,  the  original  evo¬ 
lution  strategy  hybrid  approaches  of  Tahk  ct  al.  (20)7.  2009)  used  the  quasi-Newton 
method  to  advance  their  gradient  individual.  In  contrast,  in  this  proposed  approach,  the 
Hessian  matrix  Ls  completely  recomputed  at  each  generat  ion  from  a  local  approximation 
to  the  objective  function.  In  the  next  subsection  we  give  a  synthetic  numerical  compari¬ 
son  to  demonstrate  that  the  numerical  accuracy  of  our  method  and  that  of  Tahk  ct  al. 
(2009)  are  comparable.  The  main  difference  in  the  methods  is  that,  due  to  the  use  of  the 
function  approximation  model  for  derivative  calculation,  our  algorithm  docs  not  require 
symmetrizing  the  population  and  tints  can  be  itsed  to  estimate  derivatives  and  add  local 
search  to  any  population-based  search  algorithm. 

The  local  search  technique  proposed  here  could  easily  be  modified  so  that  the  local 
search  individual  is  not  simply  the  approximate  Newton  point.  For  instance,  because 
we  have  the  local  function  approximation  available,  the  local  search  individual  could  l>e 
the  result  of  a  trust  region  search  of  the  radial  basis  function  along  the  Newton  update 
direction  similar  to  the  approach  descrilxd  by  Wild  et  al.  (24KIS).  In  fact,  any  kind  of 
local  search  algorithm  that  utilized  primarily  the  previously  collected  |>oints  could  l>e 
used  to  find  the  local  search  individual. 

2.1.  A  numerical  comparison 

To  demonstrate  the  capability  of  the  radial  basis  function  approach  to  approximating 
derivatives  the  10-dimensional  generalized  Rosenbrock  function  (see  Table  2)  is  used  as 
a  test-case.  To  facilit  ate  a  direct  comparision  with  the  Hessian  approximation  approach 
descril>ed  by  Tahk  ct  al.  (2009)  the  following  synthetic  numerical  experiment  was  con¬ 
ducted. 

The  glol>al  minimizer  of  the  Rosenbrock  function  is  the  point  1  consisting  of  all  ones. 
A  synthetic  sequence  of  points  was  selected  that  converges  toward  the  global  minimizer 
at  a  constant  rate: 

i<‘)  =  i  +  o.oii2LzI(i.2,:».  l.  :>.<>.  7. s.<>.  ii>).  .  =  i . 100  (8) 

At  each  iteration  a  population,  from  a  normal  distribution  as  in  an  evolution  strategy, 
of  A  =  80  points  is  generated  around  in  two  steps.  In  the  first  step,  //  =  40  points 
are  generated  by: 

x  =  i(0  +  0.01a,  (9) 

where  z  is  selected  from  the  10-dimensional  staudard  normal  distribution  with  mean 
0  and  variance  1.  In  the  second  step.  40  additional  points  are  generated  by  reflecting 
the  first  40  points  through  the  center  point  jt*1*  -  this  center  point  will  serve  as  the 
linearization  point  for  each  iteration  and  the  method  proposed  by  Tahk  ct  al.  (2009) 
utilizes  the  reflected  |>oints  to  produce  higher  order  approximations  to  the  gradient  and 
Hessian.  The  Rosenbrock  function  is  evaluated  at  the  SI  points  ami  the  |x>ints  and 
function  values  are  stored  in  a  database  for  use  by  our  radial  basis  fuuction  approach. 

We  do  not  present  the  details  of  the  approach  of  Tahk  ct  al.  (2009)  here,  but  at 
each  iteration  the  gradient  is  approximated  by  a  least  squares  fit  of  the  centered  finite 
differences  of  the  symmetric  pairs  through  the  linearization  ]x)int.  Moreover,  the  inverse 
Hessian  is  sequentially  updated  with  the  centered  difference  second  derivative  information 
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from  each  symmetric  pair  of  ]x)ints.  Utilizing  these  approximation  techniques,  at  each 
iteration  we  obtain  an  approximation  to  the  gradient  and  inverse  Hessian:  .9Xahk  rtn<^ 
^Talik'  respectively. 

At  each  iteration  two  different  approximations  to  the  stationary  point  are  computed. 
The  first  is  the  from  t  he  approximation  due  to  Talik  ct  al.  (2009): 


4ahk  =  X,°  +  '“Talik  =  -  ^Tahk^Talik-  <10) 

The  second  approximation  uses  the  nearest  k  =  132  points  from  among  the  last  N  =  264 
points  stored  in  the  database*  to  compute  the  radial  Ixasis  function  approximations  to  the 
gradient  and  Hessian  approximations  described  in  equations  (5)  and  ((>):  we  label  these 
ffrbf  -"‘<1  respectively 

4bf  =  x"'  +  *rbf  =  ~  ",bW 


We  compare  the  update  vectors  and  to  the  analytically  computed  update 

vector  in  Figure  1(a).  For  the  first  50  iterations  the  /iTahk  *s  A  better  approximation  to 
the  analytically  computed  update  vector  than  computed  by  radial  Ixasis  functions,  but 
for  iterations  51  11)0  the  approximations  are  comparable.  Even  though  is  not  always 
the  most  accurate  approximation  to  the  update  vector,  it  turns  out  that  it  does  provide 
a  good  search  direction.  Figure  1(b)  shows  the  distance  from  each  of  the  approximated 
stationary  ]K)ints.  ^  and  as  well  as  the  analytically  computed  Newton  point,  to 
the  global  minimum  ol  the  lO-tlimensional  Rosenbrock  function.  It  can  l>e  seen  that  the 
radial  Ixasis  function  approximation  local  search  individuals  and  the  Talik  ct  al.  (2009) 
gradient  individuals  give  comparable  approximations  to  the  global  minimizer  in  spite  of 
the  fact  that  is  sometimes  a  less  accurate  approximation  to  the  true  update  vector 
(from  the  analytic  derivatives)  than  that  of  In  both  plots,  it  can  Ixe  seen  that  the 

radial  basis  function  local  search  individiual  is  occasionally  quite  a  bad  approximation 
to  the  global  minimum.  This  apjxoars  to  happen  because  the  problem  of  radial  basis 
function  interpolation  and  derivative  interpolation  is  ill-conditioned  and  very  sensitive 
to  the  choice  of  interpolation  points.  In  most  generations  our  simple  choice  of  nearest 
neiglxor  points  is  adequate,  but  not  always.  This  will  l>e  the  subject  of  future  work. 

The  radial  basis  fuuction  approach  to  computing  the  local  search  individual  is  the 
more  expensive  algorithm,  but  can  be  adapted  to  be  used  with  any  population  based 
algorithm  as  it  requires  no  modification  of  the  underlying  sampling  algorithm.  This 
makes  the  radial  basis  function  approach  suitable  to  Ixe  used  with  t  he  covariance  matrix 
adaptation  evolution  strategy  as  will  be  described  in  the  next  section. 


3.  Covariance  Matrix  Adaptation  Evolution  Strategy  with  Local 
Search  Individual 

The  Ixasic  outline  of  t  he  hybrid  evolution  strategy  algorithm  is  illust  rated  by  the  Howchart 
in  Figure  2.  To  study  the  efficacy  of  this  version  of  our  hybridized  approach  we  have 
implemented  the  algorithm  in  the  context  of  CMAES.  While  we  have  also  done  this  in 
the  context  of  the  standard  evolution  strategy,  we  use  CMAES  here  because  it  seems 
to  have  better  global  convergence  properties  (Hansen  and  Kern  20(11.  Hansen  2009).  In 
particular,  we  utilize  the  Matlab  version  of  CMAES  (version  2.54)  provided  pnblically 
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Figure  1.  Comparison  of  rbf  local  search  ami  Tahk  local  search  for  19-dimenskmal  Roaenbiock 
fu ni  t  ion.  Figure  (a)  compares  the  update  vectors  for  the  two  methods:  Figure  (1>)  .shows  that  the 
distance  from  tin*  approximated  stationary  point  to  the  glolxal  minitnizer  ks  comparable  for  tin*  two 
methods. 


Figure  2.  Flowchart  for  the  hybrid  evolution  strategy  algorithm 


by  Hansen  (20(H). 

CM  AES  is  an  evolution  strategy  that  adapts  a  full  covariance  matrix  of  a  normal 
search  distribution  (Hansen  and  Ostermeier  1996).  The  strategy  begins  with  an  initial 
population  of  A  individuals  xj^j  y  After  evaluating  the  objective  function,  the  best  /j 
individuals  are  selected  as  parents  and  their  centroid  is  computed  by  using  a  weighted 
average:  (x)^‘  =  where  the  weights.  wi%  are  positive  reals  and  sum  to 

one.  The  notation  x*  ^  is  called  selection  notation  and  represents  the  point  wit  h  the  k**‘ 
lowest  corresponding  objective  lit  ness  value.  While  many  weighting  schemes  have  been 
proposed,  here  we  use  the  super-linear  weights:  wt  =  ln(/i)  —  ln(i),i  =  l,...,/i*  wherein 
the  individuals  with  lowest  fitness  values  get  the  highest  recombination  weights. 
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After  selection  and  recombination  a  new  population  is  created  by: 


xt+& =<*$+*<*>  *«w'*-i* 


(12) 


where  zi  *v  jV(0.  I)  are  independent  realizations  of  an  n-dimonsiona)  staudar<l  normal 
distribution  with  mean  zero  and  eovarianee  matrix  I.  The  base  points,  zt,  are  rotated 
and  scaled  by  the  eigenvectors,  B1'' and  t  he  square  n>ot  of  the  eigenvalues  .  Dn,  of 
the  covariance  matrix  ,  C,f).  The  covariance  matrix.  C  '  and  global  step  size,  are 
updated  after  each  generation.  This  approach  yields  a  strategy  that  is  invariant  to  any 
linear  transformation  of  the  search  space.  Equations  for  initializing  and  updating  the 
st  rategy  parameters  are  given  in  (Hansen  and  Kern  20(H).  For  complete  det  ails  on  the 
(’MAES  algorithm  the  tutorial  (Hansen  2010)  is  a  definitive  source. 


Table  1.  I’wimIo-okIc  <>f  the  hybrid  algorithm. 


1 

2 

3 

4 

5 

<> 

7 

8 

9 

10 
11 
12 

13 

14 

15 
1(> 


Generate  and  evaluate  initial  population  x)"’, 

Append  these  points  to  the  database 
Choose  lust  individual  and  set  zj"1 

Set  I  =  0. =  Cltix  l ,  II"  =  Cf0)  =  /nxn 

repeat 

Compute  nearest  neighbor  RBF  and  find  i/t  and  lh 
Update  the  hical  search  individual  11  =  X|s  -  (h„ 

Evaluate  11  and  append  to  the  database  if  feasible 
Select  the  l>est  p  individuals  from  the  population  and  zj^'  *' 
Generate  new  population  .V  ‘  __ '  ^  by  recombination  and  mutation 
Evaluate  new  population  and  append  to  database, 
if  min, /(x,)  <  /(»£')  *hen 
swap  individual  with  lowest  function  value  and  zj^' 
end  if 
t  =  t  +  1 

until  Stopping  criteria  are  satisfied 


Pseudo-code  lor  the  CMAES-RBFLSI  algorithm  is  shown  in  Table  1.  One  additional 
feature  that  has  not  been  mentioned  is  that  a  weighted  norm  is  used  to  compute  nearest 
neighl>ors  for  determining  the  support  of  the  local  radial  basis  function  approximation. 
We  use  the  current  covariance  matrix  of  the  CM  AES  since  it  should  reflect  the  shape 
of  the  search  distribution  and  the  objective  funetiou  surface.  The  eigenvector/ eigenvalue 
decomposition  of  the  current  covariance  matrix  is  C  =  BD2Br .  The  distance  between  a 
point  j*  €  Rn  and  the  current  gradient  individual  is  measured  as  \\{IiD) " 1  ( j:  —  J|s)||j. 
(For  instance,  a  unit  hall  in  this  norm  will  he  elliptically  shaped  to  lit  in  a  long  narrow 
valley  in  the  search  space.)  The  nearest  neighl>ors  in  this  norm  should  he  ideal  points  for 
constructing  an  approximation  to  the  llessiau  matrix. 


4.  Hybrid  Algorithm  applied  to  mathematical  test  suite 

A  small  suite  of  test  problems  Ims  lx?en  selected  to  compare  the  performance  of  our  hybrid 
CMAES  local  search  individual  algorithm  (CMAES-RBFLSI)  with  ordinary  CMAES. 
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Our  aim  is  to  establish  that  the  hybridized  approach  does  not  change  the  glolkal  conver¬ 
gence  properties  of  CM  AES  while  accelerating  the  local  convergence  rate.  For  compari¬ 
son.  we  have  also  implemented  the  quasi-Newton  Hessian-approximation  gradient  indi¬ 
vidual  approach  of  Tahk  ct  al.  (2000)  in  the  context  of  CM  AES  (their  implementation  was 
in  the  standard  evolution  strategy);  we  refer  to  this  implementation  as  CMAES-QNGI. 

In  CMAES-QNGI,  after  recombination  (finding  the  centroid  of  the  selected  parents), 
only  the  tirst  half  of  the  new  population  of  individuals  is  generated  by  equation  (12).  After 
evaluating  the  objective  function  for  these  individuals,  the  current  gradient  individual 
is  swapped  for  an  individual  with  lower  objective  function  value,  if  one  exists,  to  ensure 
that  the  gradient  individual  is  the  current  best  point.  The  remainder  of  the  current 
populat  ion  is  formed  by  reflecting  t  he  hist  half  of  the  population  symmetrically  through 
the  gradient  individual  in  Rn.  This  symmetrically  selected  population  reduces  the  order 
of  the  discretization  errors  in  forming  the  first  and  second  derivative  approximations  at 
the  gradient  individual. 

A  summary  of  the  selected  test  functions  is  shown  in  Table  2.  Function  f\  is  the 
quadratic  Schwefel  1.2  function  and  is  a  test  case  for  CM  AES  because  its  elliptical  con¬ 
tours  test  the  ability  of  the  algorithm  to  adapt  the  shape  of  the  search  distribution.  /» 
is  the  cone  function  selected  for  its  lack  of  differentiability  at  the  minimum.  /;*  is  the 
classical  generalized  Rosenbrock  function  which  has  two  minima  for  n  >  1  and  the  long 
narrow  valley  which  slows  convergence  for  many  algorithms,  /§  is  the  Schwefel  1.5  func¬ 
tion  which,  while  unimodal,  is  not  different  iable  if  any  xt  =  0.  i  =  1. - n.  The  Griewank 

function  /•,  is  multimodal,  but  is  not  particularly  challenging  for  CM  AES  and  is  smooth. 
The  Hastrigin  function.  /ti.  is  multimodal  and  smooth  and  is  a  difficult  problem  for 
CMAES  (Auger  and  Hansen  2005).  Finally,  the  Acklev  function,  f?,  is  multimodal  and 
smooth  except  at  the  minimum  and  is  somewhat  challenging  in  larger  dimensions  for 
CMAES. 


2010  9:44  Engineering  Optimization  (;rpapcaT2010'last 


Table*  2.  Tat  Functions  for  Hybrid  Optimixation  Strategy 


Name 

Definition 

Search  Domain 

Schwefel  1.2 

n  <  \2 

/><*) = it, 

-1  ;-l  / 

(-40,60)" 

Cone 

(-40,60)" 

Rusenbrock 

/»  l 

/*(*)  =  E[l00(r,+i-*?)a  +  (*,-l)2 

1-1 

(-40,60)" 

Schwefel  1.5 

1-1  *-l 

(-40,  60)" 

Griewank 

aw  -n-U) 

(—600,600)” 

Rast  rigin 

/«.<*)  =  10n  +  53  it,  (*?  _  I0cos  (2*1*)) 

<-l  1— l 

(-40,60)" 

Ackley 

f7{x)  =  -20 exp  ^-0.2^i£>:j 

-  CXP  ^  cos  +  20  +  C 

(-32,32)" 

We  examine  only  the  convergence  graphs  to  compare  the  algorithms.  The  C'MAES- 
RBFLSI  algorit  hm  is  expensive  since  it  constructs  the  local  radial  basis  function  approx¬ 
imation  at  each  generat  ion  based  on  0{n2)  points  using  a  naive  linear  solver  t  hat  requires 
0(;/')  operations.  The  solution  time  of  (1)  can  l>e  improved  by  using  a  null  space  method 
or  iterative  methocls,  but  the  presumpt  ion  here  is  that  the  objective  function  evaluations 
are  very  expensive.  Thus  local  search  algorithms,  even  expensive  ones,  can  and  should 
be  used  to  improve  the  efficiency  of  the  overall  search. 

Figures  3-9  shows  median  convergence  graphs  based  on  a  set  of  30  trials  for  each  of 
the  test  functions  for  each  of  the  three  algorithms:  CM  AES.  CMAES-RBFOI.  CMAES- 
QNGI.  In  each  figure  the  results  for  dimension  n  =  10  are  shown  in  plot  (a),  and  the 
results  for  dimension  n  =  30  arc  shown  in  plot  (b).  We  do  not  show  the  results  of 
CMAES-QNGI  at  n  =  30  since  that  algorithm  does  not  perform  well  as  will  l>e  discussed 
further  1m?1ow.  For  n  =  10.  the  population  size  A  =  30  with  /i  =  15  patents  being  selected 
at  each  generation,  while  with  n  =  30  we  use  A  =  80  and  /i  =  40..  The  initial  global 
step  size,  a  is  set  to  30%  of  the  total  length  of  the  search  domain  in  each  dimension. 
The  initial  ]>opulatiou  is  sampled  from  a  uniform  distribution,  and  the  same  samples 
are  used  to  initialize  each  of  the  three  algorithms.  For  the  OMAES-RBFLSI  algorithm, 
the  local  RBF  approximation  is  constructed  using  the  k  nearest  neighbors  with  k  = 
(n  -f  l)(;i  +  2)  which  is  twice  the  minimum  numl>er  of  points  necessary  to  construct 
a  quadratic  |x>lynomial  intcr]>olaut  in  Rn.  The  k  nearest  neighbors  are  selected  from 
among  the  hist  N  =  2k  individuals  that  have  been  evaluated.  The  algorithm  is  stopped 
when  a  minimum  objective  funct  ion  value  of  10  10  is  reached  or  when  the  best  objective 
function  value  dors  not  change  for  the  last  10  +  30;i/A  generations  or  when  the  ratio 
of  the  range  of  the  current  function  rallies  to  the  maximum  current  fuuction  value  is 
below  TolFun=  5  x  10  111 .  The  maximum  number  of  function  evaluations  was  set  to  l>o 
n  x  104. 

Several  points  can  l>e  made  upon  inspection  of  the  convergence  graphs  in  Figures  3-9. 
Fiist.  the  approach  of  Talik  e.t  al.  (2009)  appears  to  interfere  with  the  convergence  of 
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Figure  X  Convergence  graph*  for,  tin*  Sclnrefol  1.2  quiulntfic.  unlmodal  function  In  10  (a)  anil 
Ull  |b)  ilmicnsniiL-.  Convergence  ks  nearly  Instantaneous  for  C'MAES-RUFLSl  once  enough  pulut. 
arc  obtained  for  an  Inter  poliint. 


(A)  <b) 

Figure  1  Couvergenre  graphs  fin.  f2.  the  com?  function  in  111  |u)  .mil  .'#>  (b)  dimensions.  The  cone 
function  U  unimodal.  but  not  diifonttinbU*  .it  the  miuituum.  Speedup  in  neglihle. 


(’MAES.  This  does  not  seem  to  lie  a  case*  af  simple  error  in  implement  in”  their  strategy 
as  we  have  been  able  to  successfully  implement  and  ti^t  it  in  the  context  of  a  stundiud 
evolution  strategy:  results  not  shown  here.  Hat  her  the  symmotrization  of  the  imputation 
at  each  generation  seems  to  interfere  with  the  covariance  matrix  adaptation  and/or  step 
size  adaptation  algorithm  in  ('MAES.  It  may  l>o  passible  to  develop  a  new  version  ol 
(’MAES  which  can  accomodate  the  symmetrized  population  used  in  C’MAES-QNGL  but 
that  is  beyond  the  scope  of  the  current  paper.  The  RBF  local  s enroll  individual  approach 
does  not  have  this  difficulty  since  it.  docs  not  modify  the  existing  population  uf  the 
evolution  strategy  other  than  simply  appending  an  extra  individual  to  the  population. 

A  second  observation  is  that  for  simple  functions  which  are  sufficiently  smooth  near 
their  minima,  such  as  the  S’chewcfel  1.2  quadratic  {f\\  Figure  ;i)  and  Koscnhrock  (/p 
Figure  5)  functions  the  CMAES-RBFLSI  algorithm  requires  significantly  fewer  function 
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(a)  <b) 

Figure  5.  Convergence  giapli*  for.  /j.  the  generalized  Roseiihrodt  function  in  10  (u)  and  30  (l») 
dimension*.  It  has  tun  minima  ami  b  smooth,  hut  lute  a  long.  Hitt  ami  nnriuw  volley  that  mnkfc* 
optimization  slow  fm  most  algorithms.  CMAES  clearly  benefits  fmiu  local  search  as  tin*  minimum 
b  appxiMi lied. 


(a)  (b) 

Figure  6.  Convergence  graphs  fur.  /4.  the  Sdiwefol  1.S  function  in  10  (a)  and  .10  <b)  dimension*.  It 
b  umiuodnl  with  the  minimum  at  x  =  0.  but  b  not  ditfVieut  liable  if  any  sa  =  0.  i  =  1. - it. 


evaluations  than  standard  CMAES.  However,  wheu  the  function  b  not  differentiable  at 
the  minima  the  speedup  due  to  the  proposed  algorithm  Is  quite  small.  The  latter  is 
evident  in  the  cone  functon  [f  >:  Figure  l),  Schwefel  1.5  functian(/i;  Figure  0).  and  even 
the  multimodal  Ackley  function  1/7:  Figure  9). 

The  hybrid  algoiit  bin  performs  well  for  multimodal  functions  as  well.  For  instance,  in  II) 
and  10  dimensions  there  are  significant  reductions  in  the  number  of  function  evaluations 
for  the  Griewank  function  (/•:  Figure  7).  The  Griewank  function  is  a  relatively  easy 
function  for  CMAES  to  optimize,  hut  the  CMAES-RBFLSI  is  able  converge  more  quickly 
in  the  vicinity  of  the  smixith  global  minimum.  In  fact,  in  many  of  the  trials,  the  radial 
basis  function  local  search  individual  is  very  close  to  the  global  minimum  very  early  in 
the  run.  giving  rise  to  the  dips  in  the  convergence  curves  for  CMAES-RBFLSI  in  Figure 
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(a)  <b) 


Figure  7.  Cumvrgmco  gtuplin  foi.  /&.  the  Cli  icw-nnk  function  In  10  (a)  and  30  (b)  dimensions.  It 
luultiiuudal  iiiul  smooth.  While  the  Giicwauk  funrtiuu  it  nut  piulkulnrly  cludhuigiug  fur  CMAES. 
ncvfitlH'li-ts  is  greatly  accelerated  by  Inclusion  of  the  local  search  individunl.  The  nuliid  basis  function 
si'juch  individual  often  iwai ly  finds  tin-  global  uptimum  early  in  I  he  optimization  riui  giving  rise  to 
the  dips  in  these  convergence  graphs. 


00  (b) 

Figure  8.  Convergence  gi.iplm  for,  /*..  the  RaMrigiii  function  in  10  fa)  imd  30  (b)  dimension*.  It  is 
multimodal  and  smooth.  For  this  function  restarts  and  pnpulut ion  doubling  are  used:  see  tin-  text. 
ThU  functinu  is  difficult  for  CMAES  which  rarely  finds  the  global  mtinmum.  but  GMAES-RBFLSI 
b»  much  more  ivliable. 


7.  The  Rostrign  function  Figuiv  8|  is  more  interesting  as  the  population  size  needs 
to  lx?  quite  huge  to  locate  the  global  minimum.  For  this  function  a  imputation  doubling 
rest  ai  t  scheme  (Auger  and  Hansen  2tK)Fi)  was  used  in  which  the  algorit  hm  was  restarted 
iteratively  with  population  size  A*  =  2* A, k  =  1.2,11.  1.  and  fx  =  A*/2.  In  10  dimensions 
(’MAES  locates  the  global  minimum  in  uuly  5  of  the  30  trials,  while  in  3(1  dimensions 
only  1  of  the  30  trials.  While  with  C'MAES-RBFLSI,  the  glolnd  minimum  is  located  in 
22  or  3(1  trials  in  10  dimensions  and  in  20  of  3(1  trials  in  30  dimensions.  Evidently,  the 
proposed  algorithm  is  able  to  more  quickly  converge  to  local  minima  in  each  restart  *m<l 
is  t  ints  able  to  use  the  computational  budget  mure  efficiently  to  fiml  the  global  minimum. 


£  .= 
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(*)  <b) 

Figure  0.  Convergence  graphs  for.  /7.  the  Ackley  fuuctiuu  in  10  (a)  ami  M)  <b)  dimiaisioiis.  It  it> 
multimodal  .md  not  differentiable  .it  tin*  gluljal  minimum.  Fox  thin  luxa  tion  restart*  and  population 
doubling  lire  used:  see  tin*  text. 


5.  Application  to  calibration  of  a  watershed  model* 

Tu  demonstrate  lln*  utility  of  the  C MAES- RBF LSI  algorithm  iu  a  practical  setting  we 
used  the  algorithm  to  calibrate  HYMOD,  a  five- paramo  ter  conceptual  rainfall-nmoll 
model  (see  Figure  11).  introduced  by  Boyle  (20111)).  In  short,  given  time  series  of  daily 
prccipitatiim  (P)  and  evapotranspiratiun  (ET\  data  the  objective  is  to  tune  the  |uuam- 
ctcrh  so  that  the  least  square*  error  between  the  model  predicted  stream  flow  time  series 
ami  the  observed  stream  flow  time  series  is  minimized.  Such  problems  are  usually  charac¬ 
terized  by  multiple  iniuima,  sometimes  unidentifiable  parameters  and  even  discontinuities 
in  the  objective  function  (Duau  at  al.  10021. 


QuKfcfkmUrts 


SWlUmUr* 


Figure  10.  Schematic  representation  of  the  HYMOD  model:  from  (Vrugt  » t  a L  2003) 

In  attempts  to  parsimoniously  represent  the  salient  features  of  the  precipitatiuu-nmoU 
process  in  a  watershed  system,  the  HYMOD  model  model  structure  (Moore  1!)K5)  is 
characterized  by  two  series  of  linear  reservoirs:  iu  particular,  three  identical  quick  and  a 
single  slow  response  reservoir.  The  five  parameters  to  be  calibrated  for  the  model  stream 
flow  tu  match  the  observed  stream  flow  data  are:  the  maximum  ston  nfthi 

catchement.  f ’max1  the  degree  of  spatial  variability  of  the  soil  moisture  capacity,  i^xpi 
the  factor  distributing  flow  between  the  two  series  of  reservoirs.  Alpha;  and  the  risdrience 
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time  ol the  linear  quick  and  slaw  reservoirs,  /?,,  ami  /ffc.  respectively.  The  hydrologic  model 
parameters  are  inferred  by  adjusting  their  rallies  until  an  acceptable  level  of  agreement 
is  achieved  between  a  set  of  historical  observations  of  the  real  world  system  that  the 
model  represents  and  their  simulated  counterparts.  In  this  case,  the  objective  function 
is  simply  the  sum  of  the  squared  differences  between  the  observed  and  simulated  daily 
stream  flows. 

Three  years,  October  1.  1948,  to  September  30,  1951,  of  daily  hydrologic  data  from 
the  Leaf  River  watershed  were  used  for  model  calibration.  This  humid  11)11  km*  wa¬ 
tershed  is  located  north  of  Collins.  Mississippi.  The  data,  obtuined  from  the  National 
Weather  Service  Hydrology  Labratory.  consist  of  uieau  areal  precipitation  (min/d),  po¬ 
tential  evapotranspiration  (mm/d),  and  stream  flow  (mJ/s). 

Tublr  .1.  Uncertainty  ranyr*  ut  HYMOD  m<xlrl  parawrtrra _ 


Minimum 

Maximum 

THJt 

t  Tuns 

1.(101) 

500.0(10 

mm 

&exp 

0.1011 

2.000 

Alpha 

0.100 

0.900 

If, 

0.000 

o.:wo 

day 

0.000 

onoo 

day 

The  ('MAES  and  CMAES-RBFLSI  algorithms  are  each  applied  to  the  optimization  or 
calibration  of  the  HYMOD  model  fur  30  trials.  The  initial  ranges  of  the  parameter  values 
are  shown  in  Table  3:  similar  ranges  of  parameter  values  are  used  in  model  calibration 
study  in  (YYugt  ct  al.  2093).  As  for  the  mathematical  t€*st  functions  discussed  above, 
the  algorithms  are  initialuuxl  with  the  same  uniform  initial  distributions.  We  set  A  =  10 
and  fi  =  5.  Each  algorithm  stops  when  TolFun=  5c  —  1.  as  described  previously.  For 
CMAES-RBFLSI  the  tmml>cr  ol  nearest  neighbors  used  for  the  local  RBF  Ls  k  =  [1.5(n  + 
l)(/i  +  2)/2)  =  23  chosen  from  the  hist  N  =  'Ik  =  1C  evaluated  points. 


Figure  11.  Convergence  of  (’MAES  and  CMAES-RBFLSI  for  the  HYMOD  modeL 


There  are  many  local  minima,  but  CMAES  and  CMAES-RBFLSI  nearly  always 
converge  to  one  of  two  minima  =  (157.0790,9.5440,0.2370,0.2624,0.8178)  where 
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/hymod<xi)  =  128.5346  or  x.  =  (146.9868.0.7165,0.2416.0.2619,0.8313)  where 
/HYMOD<x2)  =  128.6:174.  The  global  minimum  appears  to  be  at  Xj  but  ('MAES 
converges  to  x2  in  28  of  :it)  trials,  while  CMAES-RBFLSI  converges  to  the  same 
minimum  in  27  of  HO  trials.  Evidently,  the  basin  of  attraction  for  the  global  minimum. 
Xj.  is  quite  small  as  lx»th  algorithms  have  trouble  finding  it.  The  accelerated  convergence 
of  CMAES-RBFLSI  to  the  local  minimum  of  the  HYMOD  model  is  demonstrated  in 
Figure  11.  For  each  trial  the  l>ost  function  value  is  saved  at  each  generation.  The  median 
function  value  over  the  thirty  trials  minus  the  value  at  the  local  minimum 
is  plotted  versus  the  number  of  function  evaluations.  As  Fiure  11  demonstrates,  the 
increase  in  convergence  speed  is  quite  dramatic:  CMAES-RBFGI  typically  converges 
with  fewer  than  half  of  the  objective  function  evaluations.  Though  neither  algorithm 
reliably  locates  the  glol>al  minimum,  both  algorithms  give  good  approximations  to 
the  global  minimum  that  produce  adequate  approximations  to  the  daily  stream  Hows. 
To  locate  the  global  minimum  reliably,  a  restart  strategy  could  l>e  used  as  with  the 
Rastrigin  function  above.  The  RBFLSI  method  would  still  accelerate  the  convergence 
significantly. 


6.  Conclusions 

The  local  search  individual  hybridization  approach  for  evolution  strategies  has  been 
shown  to  be  effective  for  significant  ly  accelerating  the  convergence  of  the  covariance 
matrix  adaptation  evolution  strategy  for  functions  which  exhibit  sufficient  smoothness 
near  the  minimum.  Likewise,  it  also  works  with  the  standard  evolution  strategy,  though 
the  results  are  not  shown  here. 

To  develop  a  hybrid  evolution  strategy  using  local  RBF  approximation,  as  we  have 
done  here,  requires  very  little  modification  of  the  actual  evolution  strategy.  The  evolving 
population  itself  is  not  modified,  hut  the  additional  local  search  individual  Ls  added  at 
each  generation.  In  the  approach  of  Tahk  et  al.  (2009)  the  population  is  chosen  symmet¬ 
rically  at  each  generation  and,  as  seen  here,  t  ills  can  interfere  with  the  convergence  of 

CMAES. 

Another  advantage  to  this  approach  is  that  no  minimum  population  size  is  required. 
In  (Tahk  et  <il.  2007.  2009)  the  population  size  must  be  at  least  twice  the  dimension  of 
the  search  space  to  estimate  the  gradient  vector.  The  downside  of  RBFGI  approach  is 
that  it  is  expensive  to  form  the  coefficient  matrix  in  Eq.  <1).  Moreover,  the  size  of  that 
system  scales  as  0(n2),  so  its  solutiou  by  a  direct  method  requires  0(nG)  operations 
per  generation.  Thus,  there  is  a  trade-off  between  the  computational  complexity  of  the 
RBFLSI  method  and  the  gain  due  to  fewer  funct  ion  evaluations.  For  expensive  objective 
functions  the  cost  of  addiug  a  local  search  individual  propagated  by  local  radial  basis 
function  approximation  is  expected  to  l>e  incidental  and  the  increase  in  speed  can  l>e 
enormous.  As  a  result,  the  RBFLSI  approach  should  he  incorporated  into  evolution 
strategies  for  expensive  functions  as  it  can  sometimes  greatly  increase  converge  speed 
and  reliability  with  litt  le  downside. 

In  a  future  work  we  will  consider  a  local  search  based  on  a  trust  region  search  along  the 
Newton  update  direction  on  the  local  radial  basis  function  approximation,  as  in  (Wild 
et  al.  2008)  to  locate  the  local  search  individual.  The  integration  of  a  more  robust  local 
search  may  improve  the  local  convergence  properties  of  this  hybridization  approach. 
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Introduction 

Hydrologic  models,  regardless  of  their  type  (e.g.,  empirical,  physics-based),  often  contain 
parameters  that  cannot  be  measured  directly  either  because  they  have  no  physical  basis,  it 
would  be  impractical,  or  due  to  an  incompatibility  of  scales,  among  other  possible  reasons. 
Hence,  hydrologic  model  parameters  are  inferred  by  adjusting  their  values  until  an  acceptable 
level  of  agreement  is  achieved  between  a  set  of  historical  observations  of  the  real  world  system 
that  the  model  represents  and  their  simulated  counterparts.  While  manual  model  calibration  is 
certainly  one  approach  to  the  problem,  it  is  subjective,  labor-intensive,  and  may  also  suffer 
from  a  lack  of  consistency  and/or  repeatability,  among  others.  Moreover,  it  is  difficult  to 
imagine  how  even  an  experienced  modeler  would  necessarily  manage,  in  a  manual  calibration 
context,  the  large  number  of  estimable  parameters  associated  with  present  day  practice  driven 
complex  hydrologic  model  deployments.  Fortunately,  the  computer-based  calibration  of 
hydrologic  models  (which,  In  contrast  with  the  manual  approach  to  model  calibration  is  more 
objective,  repeatable,  and  better  capitalizes  on  the  computational  capacity  of  the  modern 
computer)  is  an  active  area  of  research  and  development  (see;  for  example,  Baggett  and  Skahill, 
2010a;  Baggett  and  Skahill,  2010b;  Skahill  et  al.,  2009,  Skahill  and  Doherty,  2006;  Doherty  and 
Skahill,  2006,  and  references  cited  therein)  which  has  resulted  in  numerous  automatic 
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calibration  methods  that  are  readily  available  (see  Mattot  et  al.,  2009  and  references  cited 
therein)  for  the  modern  day  hydrologic  modeler  to  employ.  And  the  knowledge  gained  by  their 
application  and  development  has  provided  the  hydrologic  modeling  community  with  a  better 
understanding  of  some  of  the  complications  associated  with  calibrating  hydrologic  models;  viz., 
among  others,  the  existence  of  multiple  local  optima,  non-smooth  objective  function  surfaces, 
and  long  valleys  in  parameter  space  that  are  a  result  of  excessive  parameter  correlation  or 
insensitivity  (Gupta  et  al.,  2003;  Duan  et  al.,  1992). 

As  mentioned,  hydrologic  models  are  typically  calibrated  by  adjusting  parameters  encapsulated 
in  the  simulator  until  there  is  an  acceptable  level  of  agreement  between  a  set  of  historical  data 
and  their  model  simulated  counterparts.  The  parameters  obtained  via  calibration  are  often 
then  used  by  the  model  to  predict  system  behavior  for  one  or  more  pre-defmed  scenarios  of 
interest  to  different  groups  whose  life  or  livelihood  is  rooted  in  the  local  model  study  area. 
Regardless  of  the  calibration  method  employed  and  the  type  (e.g.,  empirical  or  physics-based) 
of  model  used  some  if  not  all  of  the  parameter  values  obtained  through  the  calibration  process 
possess  a  degree  of  quantifiable  uncertainty  because  the  observed  data  contain  measurement 
errors  and  also  because  the  model  never  perfectly  represents  the  watershed  system  or  exactly 
fits  the  observation  data.  And  where  model  parameters  are  uncertain  so  too  are  model 
predictions.  In  particular,  quantifying  uncertainty  supports,  among  others,  the  following 
hydrologic  modeling  activities  (Schoups  and  Vrugt,  2010;  Schoups  et  al.,  2010): 


1.  model  comparison  and  selection, 

2.  identification  of  the  best  water  management  strategies  that  reflect  the  likelihood  of 
outcomes, 

3.  data  collection  aimed  at  improving  hydrologic  predictions  and  water  management,  and 

4.  regionalization  and  extrapolation  of  hydrologic  parameters  to  ungauged  basins. 


For  example,  regarding  item  4  above,  to  quote  Vrugt  et  al.  (2003a),  "If  we  want  to  be  able  to 
regionalize  or  relate  model  parameters  to  easily  measurable  land  or  soil-surface  characteristics, 


a  prerequisite  is  that  the  parameters  be  unique,  preferably  having  a  small  variance.  From  this 
perspective,  it  is  necessary  to  infer  the  parameter  uncertainty  resulting  from  calibration 
studies." 

Model  uncertainty,  characterized  by  the  model  covariance  matrix  calculated  using  the  model 
Jacobian  evaluated  at  the  best  estimate  for  the  model,  can  be  quantified  by  employing  a 
traditional  linear  analysis.  However,  this  approach  is  local  which  does  not  dovetail  well  with  the 
understanding  that  for  hydrologic  models  there  may  exist  many  effectively  equally  acceptable 
models;  i.e.,  it  is  difficult  to  identify  a  unique  best  estimate;  and  moreover,  that  the  set  of  good 
models  may  very  well  not  necessarily  even  be  a  closed  and  bounded  set  interior  to  feasible 
parameter  space.  In  addition,  it  relies  on  a  linearity  assumption  that  is  often  violated  in 
hydrologic  modeling  practice. 

Bayesian-based  approaches  to  model  calibration,  wherein  a  prior  distribution  for  the  model  is 
proposed,  and  the  vector  of  adjustable  model  parameters  is  treated  as  a  random  variable  with 
a  target  probability  distribution  that  is  conditioned  with  observed  data,  are  a  formal  means  to 
obtain  a  realistic  and  reliable  estimate  of  model  uncertainty.  In  particular,  Markov  Chain  Monte 
Carlo  (MCMC)  simulation,  which  is  by  far  more  efficient  than  other  Monte  Carlo  methods,  is 
used  for  inference,  search,  and  optimization  with  hydrologic  models  (Harmon  and  Challenor, 
1997;  Kuczera  and  Parent,  1998;  Campbell  et  al.,  1999;  Campbell  and  Bates,  2001;  Makowski  et 
al.,  2002;  Qian  et  al.,  2003;  Kanso  et  al.,  2003;  Vrugt  et  al.,  2003a;  Vrugt  et  al.,  2003b;  Vrugt  et 
al.,  2008).  With  MCMC  we  are  interested  in  a  target  probability  distribution,  and  its  key 
elements  include  exploration  of  this  distribution  by  way  of  some  sort  of  random  walk  or 
diffusion  process  that  must  be  initialized  in  an  arbitrary  way  because  we  don’t  know  a  priori 
where  good  places  necessarily  are  in  parameter  space.  The  random  walk  is  directed  by  Markov 
chain  simulation  wherein  the  next  step  only  depends  on  the  previous  step,  and  eventually  after 
a  burn  in  period  the  target  distribution  is  identified. 

An  obvious  advantage  of  the  MCMC  method  is  that  no  assumptions  of  model  linearity,  or  even 
of  differentiability  of  model  outputs  with  respect  to  parameter  values,  are  required  for  its 
implementation;  hence  it  is  extremely  robust.  However,  this  robustness  comes  at  a  cost,  this 


being  the  high  number  of  model  runs  required  for  its  implementation.  The  choice  of  the 
proposal  distribution,  which  expresses  prior  information  about  the  model,  can  greatly  affect  the 
efficiency  of  an  MCMC  sampler.  A  poorly  chosen  proposal  distribution  will  result  in  slow 
convergence  to  the  target  distribution.  Unfortunately,  for  complex  hydrologic  models  there  is 
very  little  a  priori  knowledge  of  the  high  probability  density  region  within  parameter  space. 
Hence,  with  hydrologic  models  an  uninformative  prior;  wherein  all  parameters  have  equal 
likelihood,  may  often  be  the  best  we  can  do.  Clearly,  for  hydrologic  modeling,  there  is  a  need  to 
design  efficient  MCMC  samplers,  and  this  observation  was  the  motivation  for  the  development 
of  the  Shuffled  Complex  Evolution  Metropolis  algorithm  (SCEM-UA),  an  effective  and  efficient 
adaptive  MCMC  sampler  which  tunes  the  proposal  distribution  during  the  evolution  to  the 
stationary  posterior  target  distribution  (Vrugt  et  al.,  2003a). 

The  SCEM-UA  algorithm  is  a  modification  to  the  SCE-UA  global  optimization  algorithm  (Duan  et 
al.,  1992;  Duan  et  al.,  1993).  There  are  two  primary  modifications  both  which  prevent  SCEM-UA 
from  collapsing  into  the  region  of  a  single  best  parameter  set.  The  first  modification  involves 
replacement  of  the  downhill  simplex  method  with  the  Metropolis-annealing  scheme 
(Metropolis  et  al.,  1953).  The  second  modification  is  that  SCEM-UA  does  not  further  subdivide 
the  complex  into  subcomplexes  during  the  generation  of  candidate  points  and  it  uses  a 
different  replacement  procedure.  In  presenting  the  SCEM-UA  algorithm,  Vrugt  et  al.  (2003a) 
noted  that  their  principal  focus  was  algorithm  efficiency;  viz.,  the  number  of  simulations 
needed  to  converge  to  the  stationary  posterior  probability  distribution.  They  compared  the 
traditional  Metropolis-Hastings  sampler  (Metropolis  et  al.,  1953;  Hastings,  1970)  with  SCEM-UA 
to  infer  the  posterior  distribution  of  five  parameters  associated  with  a  conceptual  rainfall- 
runoff  model.  It  took  SCEM-UA  approximately  4,000  simulations  to  converge  to  the  stationary 
posterior  distribution,  based  on  evaluation  of  the  Gelman  and  Rubin  convergence  diagnostic 
(Gelman  and  Rubin,  1992);  whereas,  even  after  30,000  simulations  the  Metropolis-Hastings 
algorithm  had  not  converged  to  the  target  distribution  when  applying  the  same  convergence 
criteria. 


An  important  consideration  in  assessing  the  performance  of  model  calibration  software  is  that 
of  run  time.  Model  calibration  software,  no  matter  what  its  algorithmic  basis,  must  run  the 
hydrologic  model  to  be  calibrated  many  times  in  the  course  of  minimizing  the  objective 
function  that  is  used  to  characterize  model-to-measurement  misfit.  Minimizing  the  number  of 
hydrologic  model  runs  required  during  the  calibration  process  is  nearly  always  important,  but 
particularly  when  the  objective  function  landscape  contains  multiple  local  minima  or  hydrologic 
model  run  times  are  high.  Minimizing  the  number  of  required  model  runs  is  the  primary  factor 
driving  the  research  and  development  to  be  discussed  herein,  such  that  the  resulting 
optimization  and  uncertainty  tool  is  compatible  with  the  computationally  expensive  physics- 
based  models  that  are  becoming  more  commonly  used  within  the  practice  community. 

Recent  research  and  development  activity  directed  at  improving  the  efficiency  of  native 
computer-based  model  calibration  algorithms  includes  the  work  of  Skahill  et  al.  (2009)  and 
Baggett  and  Skahill  (2010a, b),  among  others.  Skahill  et  al.  (2009)  developed  an  accelerated 
derivative-based  local  search  algorithm  and  based  on  three  separate  modeling  applications 
demonstrated  efficiency  gains  anywhere  from  36-84%  in  comparison  with  the  native  algorithm. 
Baggett  and  Skahill  (2010a,b)  reported  on  an  efficiency  enhancement  to  the  state-of-the-art 
covariance  matrix  adaption  evolution  strategy  (CMAES)  (Hansen,  2006)  for  global  parameter 
identification  of  difficult  problems  with  noise  or  other  features  that  make  derivatives 
estimation  difficult.  The  increase  in  convergence  speed  was  quite  dramatic  for  their  modified 
CMAES  algorithm,  which  uses  a  local  radial  basis  function  approximation  to  the  objective 
function  to  compute  approximate  first  and  second  derivatives  to  the  objective  function  surface 
to  propagate  a  gradient  individual  alongside  the  evolving  population  for  possible  selection  each 
generation.  Based  on  a  summary  of  thirty  trials,  it  converged  with  fewer  than  half  of  the 
objective  function  evaluations  required  by  CMAES  when  applied  to  calibrate  a  hydrologic 
model. 

The  primary  objective  of  the  research  and  development  encapsulated  in  this  report  was  to 
improve  upon  the  already  existing  documented  efficiency  of  an  existing  state-of-the-art 
Bayesian  model  uncertainty  analysis  method  (Vrugt  et  al.,  2003a).  The  principal  approach  that 


was  employed  to  achieve  the  stated  objective  was  to  simultaneously  and  adaptively  construct 
an  approximation  to  the  objective  function. 


Background 

The  hydrologic  model/can  be  written  as 

9  =/|9;  x)  +  e  (1) 

where  9,  x,  0,  and  e  represent,  respectively,  the  vector  of  model  outputs,  structural  aspects  of 
the  model,  as  well  as  its  input  dataset,  model  parameters  that  are  adjustable  through  the 
calibration  process,  and  the  vector  of  statistically  independent  error  terms  with  zero 
expectation  and  constant  variance  a2.  Given  the  vector  of  observations  y,  the  vector  of 
residuals  is  given  by 

e(0)  =  9-Y  (2) 

Bayesian  statistics  treats  the  model  parameters  0  as  probabilistic  variables  having  a  joint 
posterior  probability  density  function  (pdf),  p(0|y).  The  posterior  pdf  is  a  probabilistic 
statement  about  the  parameters  0  conditioned  on  the  observed  data  y.  At  the  core  of  Bayesian 
inference  is  Bayes'  rule,  which  is  given  by 

p(0|y)  oc  £(y|0)  p(0)  (3) 

where  p{)  indicates  probability,  p(0 1  y)  is  the  posterior  probability  distribution  of  the 
parameters  0,  £(y  1 0)  is  the  likelihood  function,  and  p(0)  is  the  prior  probability  density  function. 
The  prior  pdf,  p(0),  represents  information  about  0  before  any  data  are  collected.  A  critical 
term  in  Bayes'  rule  is  the  likelihood  term;  likelihoods  can  only  be  calculated  if  an  error  model  is 
available.  Assuming  that  the  residuals  are  mutually  independent,  Gaussian  distributed,  with 
constant  variance,  and  further  assuming  a  noninformative  prior  of  the  form  p(0)  -o1,  Box  and 

Tiao  (1973)  derived  the  following  form  for  the  posterior  probability  distribution  of  0: 


P(0|y)«[M(e)]N'1*^  (4) 

where 


M(e)=zr=1(C(0))2/,1+y> 


(5) 


The  Shuffled  Complex  Evolution  Metropolis  Function  Approximation  Algorithm 

The  goal  of  the  Shuffled  Complex  Evolution  Metropolis  (SCEM-UA)  algorithm,  a  Markov  Chain 
Monte  Carlo  sampler  developed  by  Vrugt  et  al.  (2003a)  which  is  a  modified  version  of  the 
original  SCE-UA  global  optimization  algorithm  developed  by  Duan  et  al.  (1992)  was  to  efficiently 
(and  by  efficiency,  we  mean  the  number  of  forward  model  calls  necessary  to  converge  to  the 
target  posterior  distribution)  infer  the  posterior  distribution  of  hydrologic  model  parameters. 
The  SCEM-UA  algorithm  is  not  only  able  to  effectively  infer  the  posterior  distribution  of 
hydrologic  model  parameters  but  also  the  most  likely  parameters  within  this  high  probability 
density  region.  Function  approximation  methods  have  successfully  been  employed  to  improve 
upon  the  efficiency  of  native  evolutionary  strategies  utilized  for  model  calibration;  for  example, 
see  Baggett  and  Skahill  (2010a, b)  and  references  cited  therein.  By  interfacing  function 
approximation  methods  with  the  native  SCEM-UA  algorithm,  we  further  improve  upon  the 
already  existing  reported  efficiencies  of  the  SCEM-UA  MCMC  sampler.  The  new  algorithm 
presented  in  full  herein,  entitled  the  Shuffled  Complex  Evolution  Metropolis  Function 
Approximation  (SCEM-FA)  algorithm,  is  given  below  and  also  illustrated  in  Figures  1  and  2.  The 
SCEM-FA  algorithm  retains  all  of  the  elements  of  the  original  SCEM-UA  algorithm.  And  we  refer 
the  interested  reader  to  Vrugt  et  al.  (2003a)  for  a  thorough  description  and  discussion  of  the 
original  SCEM-UA  algorithm.  However,  for  purposes  of  completeness,  we  present  the  entire 
SCEM-FA  algorithm.  We  will  emphasize  those  parts  which  constitute  the  existing  function 
approximation  interface  to  the  original  SCEM-UA  algorithm.  For  further  clarity  regarding  the 
SCEM-FA  and  SCEM-UA  algorithms,  we  make  mention  now  of  the  fact  that  the  function 
approximation  interface  methodology  presented  herein  is  not  only  possible  with  SCEM-UA,  but 


likely  could  also  easily  be  adapted  and  employed  with  other  MCMC  samplers;  for  example, 
DREAM  (Vrugt  et  al.,  2008). 

1.  Generate  sample.  Sample  s  {0i,  02, ...,  04  points  randomly  from  the  prior  distribution 
and  compute  the  posterior  density  of  each  point  (p(0  ‘‘  |  y),  p(0*'|y), ....  p(0(‘'|y)}  using 
equation  (4). 

2.  Rank  points.  Sort  the  s  points  in  order  of  decreasing  posterior  density  and  store  them  in 
array  D[l:s,  1  :n  *  1]  where  n  is  the  number  of  parameters,  so  that  the  first  row  of  D 
represents  the  point  with  the  highest  posterior  density. 

3.  Build  function  approximation.  Train  a  locally  weighted  projection  regression  function 
(LWPR)  approximation  (Vijayakumar  et  al.,  2005)  using  the  s  points  randomly  sampled 
from  the  prior  distribution.  If  the  sample  is  small,  then  present  the  sample  to  LWPR 
multiple  times  in  random  order. 

4.  Initialize  parallel  sequences.  Initialize  the  starting  points  of  the  parallel  sequences,  S1, 

S2, ....  Sa,  such  that  S*  is  D[/c,  l:n  +  1),  where  k  =  1, 2, ....  q. 

5.  Partition  into  complexes.  Partition  the  s  points  of  D  into  q  complexes  C1,  C2, ....  C°,  each 
containing  m  points,  such  that  the  first  complex  contains  every  q(j-l)+l  ranked  point, 
the  second  complex  contains  every  q(j-l)+2  ranked  point  of  D,  and  so  on,  where  j  =  1,  2, 
....  m. 

6.  Evolve  each  sequence.  Evolve  each  of  the  parallel  sequences  according  to  the  Sequence 
Evolution  Metropolis  Function  Approximation  (SEM-FA)  algorithm  outlined  below. 

7.  Adjust  SEM-FA  input  value  r.  Based  on  the  monitored  acceptance  rate  in  SEM-FA,  and 
predefined  input  values  for  an  acceptance  rate  threshold  for  SEM-FA,  and  the 
occurrence  frequency  for  SEM-FA  input  parameter  adjustment,  update  the  SEM-FA 
input,  r,  a  number  in  the  interval  (0,1)  which  effectively  dials  in  or  out  the  employment 
of  function  approximation  in  SEM-FA. 

8.  Shuffle  complexes.  Unpack  all  complexes  C  back  into  D,  rank  the  points  in  order  of 
decreasing  posterior  density,  and  reshuffle  the  s  points  into  q  complexes  according  to 
the  procedure  specified  in  step  5. 


9.  Check  convergence.  Check  the  Gelman  and  Rubin  convergence  statistic  (Gelman  and 
Rubin,  1992).  If  convergence  criteria  are  satisfied,  stop;  otherwise,  return  to  step  6. 


Items  3,  6,  and  7  above  are  specific  to  the  SCEM-FA  algorithm  while  the  other  elements  are  a 
restatement  of  the  native  SCEM-UA  algorithm  originally  presented  in  Vrugt  et  al.  (2003a).  The 
first  modification,  listed  in  item  3  above,  uses  the  initial  random  sample  to  build  a  function 
approximation  model  which  is  later  used  in  SEM-FA  as  a  surrogate  for  the  objective  function. 
While  locally  weighted  progression  regression  (Vijayakumar  et  al.,  2005)  was  the  function 
approximation  method  used  for  the  current  modifications  to  the  SCEM-UA  algorithm 
documented  in  this  report,  alternative  function  approximation  methods,  such  as  radial  basis 
functions  (Powell,  1992),  could  also  have  been  used.  Item  6  above  refers  to  the  SEM-FA 
algorithm  which  is  presented  and  discussed  below  while  item  7  above  refers  to  the  current 
method  that  is  employed  to  regulate  the  degree  to  which  the  function  approximation  model  is 
utilized  in  SEM-FA.  Both  items  6  and  7  will  be  discussed  further  below. 

As  Vrugt  et  al.  (2003a)  mention,  one  of  the  essential  elements  of  the  SCEM-UA  algorithm  is  the 
Sequence  Evolution  Metropolis  (SEM)  algorithm,  wherein  new  candidate  points  are  produced  in 
each  of  the  parallel  sequences  S*  and  the  Metropolis  algorithm  is  used  to  test  whether  or  not 
the  candidate  point  should  be  added  to  the  current  sequence.  As  part  of  the  overall  effort  to 
further  improve  upon  the  already  existing  reported  efficiencies  of  the  SCEM-UA  MCMC 
sampler,  the  SEM  algorithm  was  adapted  to  include  a  function  approximation  model  which  is 
used  as  a  surrogate  for  the  objective  function.  It  is  named  SEM-FA  for  Sequence  Evolution 
Metropolis  Function  Approximation  and  it  is  presented  below  and  also  in  Figure  2.  As  with  the 
previously  mentioned  comparison  of  the  SCEM-UA  and  SCEM-FA  algorithms,  the  SEM-FA 
algorithm  retains  all  of  the  elements  of  the  original  SEM  algorithm.  And  we  refer  the  interested 
reader  to  Vrugt  et  al.  (2003a)  for  a  thorough  description  and  discussion  of  the  original  SEM 
algorithm.  However,  for  purposes  of  completeness,  we  present  the  entire  SEM-FA  algorithm. 
We  will  emphasize  those  parts  which  constitute  the  existing  function  approximation  interface 
to  the  original  SEM  algorithm. 


I.  Compute  the  mean,  p\  and  covariance  structure  £*  of  the  parameters  of  C*.  Sort  the  m 
points  in  complex  C‘  in  order  of  decreasing  posterior  density  and  compute  I4,  the  ratio 
of  the  posterior  density  of  first  (“best")  to  the  posterior  density  of  the  last  ("worst”) 
member  of  c‘. 

II.  Compute  a',  the  ratio  of  the  mean  posterior  density  of  the  m  points  in  C '  to  the  mean 
posterior  density  of  the  last  m  generated  points  in  S‘. 

III.  If  a'  is  smaller  than  a  predefined  likelihood  ratio,  T,  generate  a  candidate  point,  0"*1!,  by 
using  a  multinormal  distribution  centered  on  the  last  draw,  0",  of  the  sequence.  S',  and 
covariance  structure  cn2l‘,  where  cn  is  a  predefined  jumprate.  Go  to  step  V,  otherwise, 
continue  with  step  IV. 

IV.  Generate  offspring,  0 by  using  a  multinormal  distribution  with  mean  and 
covariance  structure  and  go  to  step  V. 

V.  If  the  random  number  from  the  interval  (0,1),  input  from  SCEM-FA,  is  less  than  r,  then 
use  the  function  approximation  to  compute  the  posterior  dentisty,  p(0(  '!)|y),  of  0"*!l 
using  equation  (4).  If  the  generated  candidate  point  is  outside  the  feasible  parameter 
space,  then  set  p(0,(*ll|y)  to  zero. 

VI.  If  the  random  number  from  the  interval  (0,1),  input  from  SCEM-FA,  is  greater  than  or 
equal  to  r,  then  perform  a  forward  model  call,  compute  the  posterior  dentisty, 

p(911*"  I  y),  of  0|,,i:  using  equation  (4),  and  update  the  LWPR  function  approximation  with 
the  new  data  point  0,:*:|.  If  the  generated  candidate  point  is  outside  the  feasible 
parameter  space,  then  set  p(0,t*1,|y)  to  zero. 

VII.  Randomly  sample  a  uniform  label  Z  over  the  interval  0  to  1. 

VIII.  If  the  random  number  from  the  interval  (0,1),  input  from  SCEM-FA,  is  less  than  r,  then 
go  to  step  IX;  otherwise,  go  to  step  XII. 

IX.  Compute  the  ratio  0  =  pie'1*11  |y)/  pfO^'ly).  If  Z  is  smaller  than  or  identical  to  0,  then 
perform  a  forward  model  call,  compute  the  posterior  dentisty,  pi©41*11 1  y),  of  0'1*11  using 
equation  (4),  and  update  the  LWPR  function  approximation  with  the  new  data  point 
01'*11.  If  the  generated  candidate  point  is  outside  the  feasible  parameter  space,  then  set 
p(0"*1,|y)  to  zero. 


X.  However,  if  Z  is  larger  than  O,  reject  the  candidate  point  and  remain  at  the  current 
position  in  the  sequence,  that  is,  0lt,I!  =  0*".  Go  to  step  XIII. 

XI.  Recompute  the  ratio  O  =  pie"'"  | y)/  p(0'"  |  y).  If  Z  is  smaller  than  or  identical  to  Q,  then 
accept  the  new  candidate  point.  However,  if  Z  is  larger  than  0,  reject  the  candidate 
point  and  remain  at  the  current  position  in  the  sequence,  that  is,  01'*11  =  0"1.  Go  to  step 
XIII. 

XII.  Compute  the  ratio  0  =  p(0':,",,|y)/  p(0(t||y).  If  Z  is  smaller  than  or  identical  to  0,  then 
accept  the  new  candidate  point.  However,  if  Z  is  larger  than  0,  reject  the  candidate 
point  and  remain  at  the  current  position  in  the  sequence,  that  is,  01'*1*  =  0"1. 

XIII.  Add  the  point  0(l*"  to  the  sequence  S*. 

XIV.  If  the  candidate  point  is  accepted,  replace  the  best  member  of  C‘  with  0,ul),  and  go  to 
step  XV;  otherwise  replace  the  worst  member  (m)  of  C1  with  0*1*11,  provided  that  r*  is 
larger  than  the  predefined  likelihood  ratio,  T,  and  p(0'"‘l|y)  is  higher  than  the  posterior 
density  of  the  worst  member  of  C*. 

XV.  Repeat  the  steps  I  -  XIII  L  times,  where  L  is  the  number  of  evolution  steps  taken  by  each 
sequence  before  complexes  are  shuffled. 


Items  I  -  IV,  VI,  VII,  and  XII  -  XV  (with  the  SCEM-FA  input  value  r  set  to  zero)  are  a  restatement 
of  the  native  SEM  algorithm  originally  presented  in  Vrugt  et  al.  (2003a).  The  SCEM-FA  algorithm 
is  equivalent  to  the  original  SCEM-UA  algorithm  when  the  SCEM-FA  input  parameter  r  is  set  to  a 
value  of  zero.  Items  I  -  XV  directly  above  list  the  existing  modification  to  the  original  SEM 
algorithm. 

The  basic  reasoning  behind  SEM-FA  is  that  if  the  function  approximation  prediction,  which  is 
used  as  a  surrogate  for  the  objective  function,  suggests  that  the  candidate  point  should  be 
selected,  by  way  of  evaluation  of  the  Metropolis  algorithm  criterion  (Metropolis  et  al.,  1953), 
then  proceed  ahead  with  a  forward  model  call  and  re-evaluation  of  the  Metropolis  algorithm 
criterion  to  determine  if  in  fact  the  candidate  point  is  to  be  accepted  or  not.  And  if  the 
Metropolis  algorithm  criterion  computed  using  the  function  approximation  prediction  indicates 


otherwise,  then  reject  the  candidate  point.  In  effect,  the  function  approximation  prediction 
serves  as  a  screening  device  in  that  forward  model  calls  are  only  performed  when  it  suggests 
that  it  would  be  beneficial.  And  the  degree  to  which  the  filter  is  applied  is  based  on  a  SCEM-FA 
input  parameter,  r,  which  is  dynamically  adjusted  during  SCEM-FA  execution,  and  its 
comparison  (see  Figure  2)  with  a  unique  randomly  sampled  uniform  label  over  the  interval  0  to 
1  that  is  passed  to  SEM-FA  for  the  evolution  of  each  sequence  (see  Figure  1). 

If  the  SCEM-FA  input  value  for  r  is  greater  than  zero,  then  the  function  approximation 
adaptations  described  above  and  also  shown  in  Figures  1  and  2  will  be  active.  In  this  case,  the 
value  for  r  is  reset  to  zero  at  the  beginning  of  SCEM-FA  execution  and  dynamic  adjustment  is 
subsequently  based  not  only  on  a  comparison  of  the  candidate  point  acceptance  rate  within 
SEM-FA  with  a  user  specified  acceptance  rate  threshold,  but  also  the  integer  value  for  a 
persistence  parameter  which  determines  the  frequency  for  updating  the  value  for  r.  In 
particular,  at  present,  if  it  is  an  opportunity  to  update  r  and  the  SEM-FA  acceptance  rate  is 
less/greater  than  the  user  specified  acceptance  rate  threshold,  then  decrease/increase  the 
value  for  r  by  one-tenth.  The  minimum  possible  value  for  r  is  zero,  and  its  maximum  is 
equivalent  to  its  specified  input  value.  At  present,  an  input  value  is  specified  for  r.  However,  it 
could  possibly  be  effectively  removed  as  an  input  for  SCEM-FA  by  altering  the  existing  dynamic 
adjustment  process  to  simply  allow  the  value  for  r  to  vary  between  zero  and  one.  Testing  is 
needed  to  explore  this  potential  opportunity.  If  it  is  not  already  clear  to  the  reader, 
decreasing/increasing  the  value  for  r  increases/decreases  the  number  of  forward  model  calls 
within  SEM-FA. 

Guidance  for  the  proper  selection  of  SCEM-UA  algorithmic  input  parameter  values  can  be  found 
in  Vrugt  et  al.  (2003a).  The  SCEM-FA  algorithm  contains  three  additional  parameters  that  at 
present  need  to  be  specified  by  the  user.  These  are  (1)  the  random  number  threshold,  r,  (2)  the 
acceptance  rate  threshold,  a,  and  (3)  the  parameter,  p,  an  integer  value  which  determines  the 
update  frequency  for  r.  The  increment/decrement  value  embedded  in  the  dynamic  adjustment 
process  for  r  could  also  be  viewed  as  a  parameter  that  could  possibly  impact  SCEM-FA 
performance.  Further  exploration  in  terms  of  how  these  parameters  affect  the  reliability  (i.e.. 


the  capacity  to  converge  to  the  same  posterior  probability  distribution  as  the  native  SCEM-UA 
algorithm)  and  efficiency  of  SCEM-FA  is  needed  before  any  recommendations  can  be  provided 
for  default  values.  However,  optimal  acceptance  rates  for  MCMC  samplers  range  anywhere 
from  20  -  70%  in  the  literature  (Gallagher  and  Doherty,  2007). 

Additional  opportunities  exist,  of  course,  for  further  exploration  in  terms  of  their  potential 
capacity  to  yield  additional  efficiency  gains  beyond  those  already  achieved  and  documented 
below  with  the  existing  SCEM-FA  implementation.  These  include,  among  others,  (1)  relaxing  the 
current  requirement  to  perform  a  forward  model  call  every  time  the  function  approximation 
suggests  that  the  candidate  point  is  to  be  accepted,  and  (2)  comparing  the  current  function 
approximation  model  with  an  alternative  model,  such  as  radial  basis  functions  (Powell,  1992). 
Both  of  these  opportunities  would  be  modest  development  efforts. 

With  respect  to  the  first  opportunity  noted  directly  above,  at  present,  SCEM-FA  is  biased 
conservatively  in  that  we  completely  trust  the  function  approximation  prediction  to  reject 
candidate  points;  whereas,  if  the  function  approximation  prediction  suggests  that  the  candidate 
point  is  to  be  accepted,  then  we  go  to  additional  measures  to  ensure  that  is  in  fact  the  case. 

One  alternative  would  be  to  simply  accept  the  candidate  point  when  the  function 
approximation  prediction  suggests  it  should;  however,  that  approach  may  be  too  aggressive 
and  impair  the  reliability  of  SCEM-FA.  A  relatively  simple  easily  implementable  approach  would 
be  to  monitor  the  success  rate  of  the  function  approximation  prediction  and  use  that  as  a  basis 
for  deciding  whether  to  perform  a  forward  model  call  after  the  function  approximation 
prediction  suggests  the  candidate  point  is  to  be  accepted.  The  second  opportunity  mentioned 
directly  above  would  be  a  fairly  modest  effort  because  early  SCEM-FA  development  utilized  a  k- 
nearest  neighbor  cubic  radial  basis  function  (RBF)  local  function  approximation  model  rather 
than  locally  weighted  projection  regression  (LWPR). 


STAKI 


Figure  1.  Flowchart  of  the  SCEM-FA  algorithm. 


Figure  2.  Flowchart  of  the  SEM-FA  algorithm  employed  in  the  SCEM-FA  algorithm. 


Case  Study 


To  comprehensively  demonstrate  the  efficiency  gains  that  can  be  achieved  with  the  SCEM-FA 
development  efforts  to  date,  all  the  while  maintaining  consistency  with  respect  to  convergence 
to  the  same  target  distribution,  thirty  unique  instances  of  SCEM-UA  and  SCEM-FA  were  each 
employed  to  infer  the  posterior  distribution  of  thirteen  Sacramento  soil  moisture  accounting 
(SAC-SMA)  hydrologic  model  parameters  using  hydrologic  data  from  the  1944  km‘  Leaf  River 
watershed  near  Collins,  MS.  The  SAC-SMA  hydrologic  model  is  used  by  the  National  Weather 
Service  (NWS)  for  flood  forecasting  throughout  the  United  States.  While  it  has  16  parameters 
that  need  to  be  specified  by  the  user,  consistent  with  previous  work  (see  Vrugt  et  al.,  2003b 
and  references  cited  therein),  13  were  specified  as  adjustable.  The  prior  uncertainty  ranges  of 
the  specified  adjustable  SAC-SMA  hydrologic  model  parameters  are  defined  in  Table  1.  The 
reader  is  referred  to  (see  Vrugt  et  al.,  2003b  and  references  cited  therein)  for  comprehensive 
discussions  regarding  the  SAC-SMA  hydrologic  model,  the  Leaf  River  watershed,  and  also  its 
related  hydrologic  data  (viz.,  mean  areal  precipitation  (mm/day),  potential  evapotranspiration 
(mm/day),  and  streamflow  (m3/s))  that  was  used  to  support  the  effective  inference  of  the 
posterior  distribution  of  the  SAC-SMA  adjustable  model  parameters  and  also  the  most  likely 
parameters  within  this  high  probability  density  region. 

Results  associated  with  the  thirty  trials  are  summarized  in  Tables  2-8  and  Figure  3.  The  results 
presented  in  Tables  2  -  5  are  associated  with  an  earlier  version  of  SCEM-FA  wherein  the  input 
parameter  r  was  fixed  and  not  dynamically  adjusted  as  it  is  now,  based  on  the  candidate  point 
acceptance  rate,  an  acceptance  rate  threshold,  and  the  persistence  parameter,  p,  which 
dictates  the  update  frequency  for  r.  Examining  the  results  in  Table  2,  we  clearly  see  as  one 
would  expect,  improved  efficiency  for  SCEM-FA  relative  to  SCEM-UA  as  the  value  for  r 
increases.  However,  the  improved  efficiency  that  is  obtained  through  more  aggressive 
utilization  of  the  function  approximation  prediction  comes  at  the  cost  of  decreased 
effectiveness  in  terms  of  convergence  to  the  same  posterior  probability  distribution  as  SCEM- 
UA,  evidenced  upon  inspection  of  the  lower  order  statistics  for  the  objective  function  and  SAC- 
SMA  parameter  values  that  are  presented  in  Tables  3-5. 


In  attempts  to  balance  efficiency  with  effectiveness,  different  heuristics  for  controlling  the 
activation  of  the  function  approximation  prediction  within  SCEM-FA  were  subsequently 
implemented,  resulting  in  the  existing  SCEM-FA  implementation  documented  in  this  report. 
Based  on  the  thirty  trials,  the  average  number  of  forward  model  calls  for  SCEM-UA  was  87,253; 
whereas,  it  was  68,642  with  SCEM-FA,  resulting  in  an  approximate  21%  reduction  in  total 
forward  model  calls.  Comparing  lower  order  statistics  associated  with  the  objective  function 
and  related  parameter  values  obtained  from  samples  generated  after  convergence  to  a 
posterior  distribution  has  been  achieved  with  either  the  SCEM-FA  or  SCEM-UA,  as  shown  in 
Tables  6  -  8,  it  is  clear  that  the  existing  SCEM-FA  algorithm  converged  to  the  same  target 
distribution  as  SCEM-UA.  The  results  presented  in  Figure  3,  marginal  posterior  probability 
distributions  of  the  SAC-SMA  model  parameters  based  on  15,000  (500  for  each  of  the  30  trials) 
samples  generated  with  the  SCEM-UA  and  SCEM-FA  algorithms  after  convergence  has  been 
achieved  with  SCEM-UA  and  SCEM-FA,  further  confirm  this  observation.  The  results  presented 
in  Tables  6-8  and  Figure  3  were  obtained  with  SCEM-FA  input  parameters  set  to  r  =  0.9,  a  = 
0.35,  and  p  =  3. 


Parameter 

Minimum 

Maximum 

Unit 

UZTWM 

1 

150 

[mm] 

UZFWM 

1 

150 

[mm] 

UZK 

0.1 

0.5 

day  1 

PCTIM 

0 

0.1 

H 

ADIMP 

0 

0.4 

N 

ZPERC 

1 

250 

H 

REXP 

1 

5 

N 

LZTWM 

1 

500 

[mm] 

LZFSM 

1 

1000 

[mm] 

LZFPM 

1 

1000 

[mm] 

LZSK 

0.01 

0.25 

day1 

LZPK 

0.0001 

0.025 

day1 

PFREE 

0 

0.6 

N 

Table  1.  Prior  Uncertainty  Ranges  of  the  SAC-SMA  Model  Parameters. 


Total  Model  Calls _ 

SCEM-FA 

Random  Number  Threshold  in  SEM-FA 


SCEM-UA 

0.1 

0.3 

0.5 

0.7 

0.9 

Average 

%  reduction 

87253 

79379 

9.0 

70602 

19.1 

66852 

23.4 

55158 

36.8 

50090 

42.6 

Table  2.  Summary  of  efficiency  for  an  earlier  version  of  SCEM-FA,  relative  to  SCEM-UA,  for  fixed 
values  of  r.  Results  are  based  on  thirty  unique  instances  of  the  earlier  version  of  SCEM-FA  and 
also  SCEM-UA. 


Method 

Objective  Function  Values 
Average  Standard  Deviation 

SCEM-UA 

13.31669413 

0.02331727 

SCEM-FA  (r=0.1) 

13.3073988 

0.030142989 

SCEM-FA  (r=0.3) 

13.35390347 

0.21756169 

SCEM-FA  (r=0.5) 

13.3382782 

0.218278563 

SCEM-FA  (r=0.7) 

13.42101667 

0.325857754 

SCEM-FA  (r=0.9) 

13.6533664 

0.252643413 

Table  3.  Summary  of  objective  function  value  lower  order  statistics  for  an  earlier  version  of 
SCEM-FA,  relative  to  SCEM-UA,  for  fixed  values  of  r.  Each  individual  result  is  based  on  thirty 
unique  instances  for  the  earlier  version  of  SCEM-FA  and  also  SCEM-UA,  in  particular,  15,000 
(500  for  each  of  the  30  trials)  samples  generated  after  convergence  to  a  posterior  distribution 
has  been  achieved  with  either  the  SCEM-FA  or  SCEM-UA. 


Muted 

UZTWM 

uz-vvv 

tCTM 

A0IMP 

SVojI  Fararrfttrt 

7HK  LZTWM 

IZFSM 

IZPM 

125* 

lZP< 

mtt 

SCEM-UA 

26.8812 

29-5574 

0-3811 

0  COIG 

0-2722 

aS.3623 

3.7797 

250.7199 

4a.i:i2 

l  OS-5772 

C.2354 

0.02  » 

01502 

SCEM-FA  <i=0.1| 

26.627? 

19-28*2 

0-3798 

(L0C87 

0-2762  . 

106.1752 

3.7819 

249.4670 

48.6153 

105.7939 

0.2367 

0.0199 

0.1472 

5CEM TA  (r-0.3| 

*4.7163 

35.9214 

0.3405 

0.057 

0-2fc*5  . 

IC4.Q5S5 

3.7810 

248.0050 

97.5675 

116.9293 

0.2347 

cxtoi 

01571 

SCFM-FA  (H)i5| 

35.2474 

36  3110 

0  3448 

0068 

0  2847 

K9  61C* 

3W05 

7498637 

97.2791 

1365201 

02432 

00108 

01629 

$<EM-FA(r-0.7| 

51.6347 

456982 

03906 

0098 

03013 

mom 

39138 

239  9127  167.4767  199  7868 

0.2422 

00177 

01853 

SaV-FA  <r=0.9i 

111.1248 

66^665 

0.4C62 

00214 

03526 

12S.I139 

4  1614 

ICO.  3526 

218.5091 

212-511? 

C.2413 

00152 

Q2B03 

Table  4.  Summary  of  SAC-SMA  average  parameter  values  obtained  from  an  earlier  version  of 
SCEM-FA,  for  fixed  values  of  r,  and  also  SCEM-UA.  Each  individual  result  is  based  on  thirty 
unique  instances  for  the  earlier  version  of  SCEM-FA  and  also  SCEM-UA,  in  particular,  15,000 
|S00  for  each  of  the  30  trials)  samples  generated  after  convergence  to  a  posterior  distribution 
has  been  achieved  with  either  the  SCEM-FA  or  SCEM-UA. 


Mlltind 

l/ZTWM 

U2FWM 

UZ< 

»CT1M 

40  IMP 

Ad/Jttifc*  fAidd  Pjr*ratfifi 

Sii^aird  Dttlxfon  ViAjit 

2WRC  OCX?  lZT\SM 

12/  SM 

12/PM 

3 

9 

L2PK 

PfltEI 

5CFM-UA 

4  2085 

26528 

00512 

00726 

00262 

26.5712 

04820 

109419 

11  6194 

11.3510 

0C085 

00023 

00347 

SCEM-FA  |r=0-l> 

4.2885 

3.3627 

0.0526 

O.O!  2  4 

0.0295 

25.2751 

0.4926 

11.2280 

11X630 

11.1780 

QXG88 

0X027 

0X345 

SCEM-FA  |r=0-3> 

79.4533 

25.9629 

0.0580 

O.OI85 

00396 

32.0259 

0-5540 

15.1575 

1829817 

122.1159 

0X075 

0X054 

0X478 

SCEM-FA  (r=&S) 

29.4966 

26.4420 

CL0573 

0.0181 

0.0369 

32.24*1 

OJ393 

146277 

18X8541 

124.16S0 

0X066 

0X053 

0X386 

SCEM-FA  (r-0  7) 

4774*) 

404475 

00677 

00133 

0CU86 

405929 

07078 

731352 

2<94427  209KX8 

00*53 

00076 

00717 

47.1152 

569383 

0.«2 

00130 

00391 

46.5461 

0  7981 

33  2479 

2976354  208.1662 

OCC90 

00080 

00967 

Table  5.  Summary  of  standard  deviations  associated  with  SAC-SMA  parameter  values  obtained 
from  an  earlier  version  of  SCEM-FA,  for  fixed  values  of  r,  and  also  SCEM-UA.  Each  individual 
result  is  based  on  thirty  unique  instances  for  the  earlier  version  of  SCEM-FA  and  also  SCEM-UA, 
in  particular,  15,000  (500  for  each  of  the  30  trials)  samples  generated  after  convergence  to  a 
posterior  distribution  has  been  achieved  with  either  the  SCEM-FA  or  SCEM-UA. 


Method 

Objective  Function  Values 

Average 

Standard  Deviation 

SCEM-UA 

13.31669413 

0.02331727 

SCEM-FA 

13.27271167 

0.02663205 

Table  6.  Summary  of  objective  function  value  lower  order  statistics  for  SCEM-FA,  relative  to 
SCEM-UA.  Each  individual  result  is  based  on  thirty  unique  instances  for  SCEM-FA  and  also 
SCEM-UA,  in  particular,  15,000  (500  for  each  of  the  30  trials)  samples  generated  after 
convergence  to  a  posterior  distribution  has  been  achieved  with  either  the  SCEM-FA  or  SCEM- 
UA. 


Adiiatabfc  Model  Parameter; 

Method 

UZTWM 

UZPiVM 

UZK 

AOMP 

Average  v&*jei 

ZP£*C  REXP  IZTWM 

LZfSM 

L2FPM 

12$< 

LZ(>< 

PFREE 

SCEM-UA 

26.6812 

29.5574 

0  3811 

0.0010 

0.2722 

203.3623  3  7797  2 SO  7199 

48.1212 

105.5772 

0.2364 

00200 

(1.1502 

SCEM-FA 

27.3790 

29.1978 

0  3919 

0.0033 

0-2780 

216.8956  5  9589  252  2869 

50.7311 

105.1115 

0.2408 

00212 

0.1565 

Table  7.  Summary  of  SAC-SMA  average  parameter  values  obtained  from  SCEM-FA  and  also 
SCEM-UA.  Each  individual  result  is  based  on  thirty  unique  instances  for  SCEM-FA  and  also 
SCEM-UA,  in  particular,  15,000  (500  for  each  of  the  30  trials)  samples  generated  after 
convergence  to  a  posterior  distribution  has  been  achieved  with  either  the  SCEM-FA  or  SCEM- 
UA. 


^dlialsWe  Model  Parameter; 

Method 

JZTWM 

UZFiVM 

UZK 

PC  TLVi 

ADIMP 

Standard  Deviation  Vatas 

2PERC  RtXP  IZTWM 

IZ?SM 

LZFPM 

l&C 

L  ZFK 

PFREE 

SC£M*UA 

4.2085 

2.6S22 

0.0512 

0.0026 

0.0262 

26^712  0.4820  10.9419 

11.6494 

11.3S10 

0.CCB5 

0.0025 

0.0242 

SCEM-FA 

3.2617 

2  615<3 

0  0493 

0.0020 

0.0228 

196017  0  4135  9  8273 

114531 

10.1219 

00360 

0.CO20 

0.019B 

Table  8.  Summary  of  standard  deviations  associated  with  SAC-SMA  parameter  values  obtained 
from  SCEM-FA  and  also  SCEM-UA.  Each  individual  result  is  based  on  thirty  unique  instances  for 
SCEM-FA  and  also  SCEM-UA,  in  particular,  15,000  (500  for  each  of  the  30  trials)  samples 
generated  after  convergence  to  a  posterior  distribution  has  been  achieved  with  either  the 
SCEM-FA  or  SCEM-UA. 


Figure  3.  The  marginal  posterior  probability  distributions  of  the  SAC-SMA  model  parameters 
inferred  for  the  Leaf  River  watershed  using  the  15,000  (500  for  each  of  the  30  trials)  samples 
generated  with  the  SCEM-UA  (1M  and  2'*  columns)  and  SCEM-FA  ( 2"a  and  4m  columns) 
algorithms  after  convergence  has  been  achieved  with  SCEM-UA  and  SCEM-FA. 


Summary 


This  report  began  by  outlining  the  need  for  hydrologic  model  calibration  and,  related,  the 
realistic  assessment  of  hydrologic  model  uncertainty,  which  has  many  benefits,  including  model 
comparison  and  selection,  identification  of  the  best  water  management  strategies  that  reflect 
the  likelihood  of  outcomes,  data  collection  aimed  at  improving  hydrologic  predictions  and 
water  management,  and  regionalization  and  extrapolation  of  hydrologic  parameters  to 
ungauged  basins.  Bayesian-based  approaches  to  model  calibration,  in  particular  Markov  Chain 
Monte  Carlo  (MCMC)  simulation  methods,  are  a  formal  means  to  obtain  a  realistic  and  reliable 
estimate  of  model  uncertainty.  However,  their  application  comes  at  a  computational  cost.  For 
hydrologic  modeling,  it  was  noted  that  there  is  a  need  to  design  efficient  MCMC  samplers,  and 
this  observation  was  in  fact  the  motivation  for  the  development  of  the  Shuffled  Complex 
Evolution  Metropolis  algorithm  (SCEM-UA)  (Vrugt  et  al.,  2003a).  The  primary  objective  of  the 
research  and  development  encapsulated  in  this  report  was  to  improve  upon  the  already 
existing  documented  efficiency  of  the  state-of-the-art  Bayesian  model  uncertainty  analysis 
method  SCEM-UA  (Vrugt  et  al.,  2003a).  As  with  other  recent  research  and  development  activity 
that  was  directed  to  enhancing  the  efficiency  to  the  state-of-the-art  covariance  matrix  adaption 
evolution  strategy  (CMAES)  (Baggett  and  Skahlll,  2010a;  Baggett  and  Skahill,  2010b),  the 
principal  approach  that  was  employed  to  achieve  that  stated  objective  was  to  simultaneously 
and  adaptively  construct  an  approximation  to  the  objective  function. 

The  report  followed  with  some  brief  background  material  and  then  a  description  of  the  current 
methodology  that  is  employed  for  interfacing  a  function  approximation  model  with  the  native 
SCEM-UA  algorithm  to  improve  upon  its  already  existing  documented  efficiency.  Thereafter, 
based  on  a  comprehensive  set  of  thirty  trials  using  the  SAC-SMA  soil  moisture  accounting 
hydrologic  model  and  local  hydrologic  data  for  the  Leaf  River  watershed  near  Collins,  MS,  it  was 
clearly  demonstrated  that  SCEM-FA  was  able  to  achieve,  on  average,  a  21%  reduction  in  total 
model  calls  while  inferring  the  same  posterior  probability  distribution  as  that  obtained  with 
SCEM-UA. 


Several  opportunities  exist  for  future  development  (and  likely  additional  efficiency  gains)  and 
also  application.  Numerical  experiments  are  needed  to  explore  how  the  random  number 
threshold,  r,  the  acceptance  rate  threshold,  a,  the  parameter,  p,  an  integer  value  which 
determines  the  update  frequency  for  r,  and  the  increment/decrement  value  embedded  in  the 
dynamic  adjustment  process  for  r  impact  overall  SCEM-FA  performance,  relative  to  SCEM-UA,  in 
terms  of  efficiency  and  reliability.  A  relatively  simple  and  easily  implementable  approach  that 
would  likely  yield  additional  efficiency  gains  for  the  current  implementation  of  SCEM-FA  would 
be  to  monitor  the  success  rate  of  the  function  approximation  prediction  and  use  that  as  a  basis 
for  deciding  whether  to  perform  a  forward  model  call  after  the  function  approximation 
prediction  suggests  the  candidate  point  is  to  be  accepted.  Early  SCEM-FA  development  utilized 
a  k-nearest  neighbor  cubic  radial  basis  function  (RBF)  local  function  approximation  model 
rather  than  locally  weighted  projection  regression  (LWPR).  It  would  be  interesting  to  explore 
how  the  two  different  function  approximation  models  impact  overall  SCEM-FA  performance, 
relative  to  SCEM-UA,  in  terms  of  efficiency  and  reliability.  Moreover,  it  could  be  of  potential 
benefit  to  explore  ways  in  which  the  confidence  estimate  associated  with  the  LWPR  function 
approximation  prediction  could  be  beneficially  used  to  improve  overall  SCEM-FA  performance, 
relative  to  SCEM-UA,  in  terms  of  efficiency  and  reliability.  Additional  case  studies  are  needed  to 
further  document  SCEM-FA  performance  in  terms  of  efficiency  relative  to  SCEM-UA.  And  future 
applications  need  to  focus  on  model  prediction  uncertainty.  As  was  mentioned  earlier,  the 
methods  reported  upon  in  this  report  should  be  relatively  easy  to  transfer  to  other  MCMC 
methods.  It  is  our  intent  to  explore  just  that  with  the  DREAM  MCMC  sampler,  particularly  in 
light  of  potential  balance  issues  that  have  been  presented  regarding  the  SCEM-UA  algorithm 
(Vrugt  et  al.,  2008).  The  code  for  the  SCEM-FA  algorithm  is  available  from  the  first  author. 
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